mindformers.models.multi_modal.ModalContentTransformTemplate

class mindformers.models.multi_modal.ModalContentTransformTemplate(output_columns=None, tokenizer=None, mode='predict', vstack_columns=None, modal_content_padding_size=1, max_length=2048, **kwargs)[source]

Base class of modal content transform template. It should be implemented by the specific model. The child class can override the methods build_conversion_input_text, update_result_before_output, batch, post_process to achieve the model's expectations.

Parameters:

output_columns (List[str], optional) – Specify which columns will be output. Default: None.
tokenizer (Tokenizer, optional) – Build a good model tokenizer. Default: None.
mode (str, optional) – running mode, predict or train. Default: predict.
vstack_columns (List[str], optional) – Specify which columns will be vstack when batching data. Default: None.
modal_content_padding_size (int, optional) – Used in training mode for inherited Template subclasses, it usually represents the maximum number of supported modal contents (such as images) within a training sample. When the number of modal contents in a training sample is less than this value, the modal contents will be expanded to that value. Default: 1.
max_length (int, optional) – Used in training mode, for inherited Template subclasses, it usually represents the maximum length that a training sample can fill in after the content mask is completed after segmentation. Default: 2048.
kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Examples

>>> from mindformers.models.multi_modal import ModalContentTransformTemplate
>>> ModalContentTransformTemplate().supported_modal
[]
>>> # Note:
>>> #     The property of 'supported_modal' should be inherited by subclasses,
>>> #     and subclasses implement the corresponding modal builders.
>>> #     The current base class does not support any modal builders, so it returns '[]'.

batch(data_list, token_padding_length, **kwargs)[source]

Batch the column data in the output_names.

Parameters:

data_list (list) – A list containing multiple data items.
token_padding_length (int) – Used to pad the length of "tokens" to ensure that all text data has the same length.
kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

A dict. Used to store the batched data.

abstract build_conversation_input_text(raw_inputs, result_recorder)[source]

Used in predict mode, processing the input textual data into a conversation form. Usually inherited and used by quilt class.

Parameters:

raw_inputs (str) – input data.
result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

Str type. A processed text with conversation form.

build_labels(text_id_list, result_recorder, **kwargs)[source]

Used in training mode, for subclasses to inherit, to construct the labels needed for training from text data.

Parameters:

text_id_list (list) – A list containing text data identifiers or indices.
result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.
kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

build_modal_context(input_ids, result_recorder, **kwargs)[source]

According to the requirements of the modal builder, process the input_ids and finally return the processed input_ids.

Parameters:

input_ids (list) – input data.
result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.
kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

The processed input_ids.

static get_need_update_output_items(result)[source]

Retrieve the output items that need to be updated.

Parameters:: result (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.
Returns:: A Dict. Defaults to an empty dict.

post_process(output_ids, **kwargs)[source]

Decode the model's output_ids into text strings.

Parameters:

output_ids (list) – A list containing the model's output_ids.
kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

A list containing all decoded text strings.

process_predict_query(query_ele_list, result_recorder)[source]

In predict mode, find the corresponding modal builder by traversing and process it.

Parameters:

query_ele_list (List[dict]) – A list of elements for predicting a request. For example: [{"image":"/path/to/image"}, {"text":"describe image in English"}].
result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

The text results processed by each modal builder.

process_train_item(conversation_list, result_recorder)[source]

In train mode, find the corresponding modal builder by traversing and process it.

Parameters:

conversation_list (List[List]) – A list of elements for dialogue data. For example: [["user", "<img>/path/to/image<img>describe image in English:"], ["assistant", "the image describe …."]]
result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

The text results processed by each modal builder.

property supported_modal

Used to return the templates supported of modal builder type by an instance.

Returns:: List type, containing the types of modal builder supported by an instance.