mindformers.models.multi_modal.ModalContentTransformTemplate

View Source On AtomGit
class mindformers.models.multi_modal.ModalContentTransformTemplate(output_columns=None, tokenizer=None, mode='predict', vstack_columns=None, modal_content_padding_size=1, max_length=2048, **kwargs)[source]

Base class of modal content transform template. It should be implemented by the specific model. The child class can override the methods build_conversion_input_text, update_result_before_output, batch, post_process to achieve the model's expectations.

Parameters:
  • output_columns (List[str], optional) – Specify which columns will be output. Default: None.

  • tokenizer (Tokenizer, optional) – Build a good model tokenizer. Default: None.

  • mode (str, optional) – running mode, predict or train. Default: predict.

  • vstack_columns (List[str], optional) – Specify which columns will be vstack when batching data. Default: None.

  • modal_content_padding_size (int, optional) – Used in training mode for inherited Template subclasses, it usually represents the maximum number of supported modal contents (such as images) within a training sample. When the number of modal contents in a training sample is less than this value, the modal contents will be expanded to that value. Default: 1.

  • max_length (int, optional) – Used in training mode, for inherited Template subclasses, it usually represents the maximum length that a training sample can fill in after the content mask is completed after segmentation. Default: 2048.

  • kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Examples

>>> from mindformers.models.multi_modal import ModalContentTransformTemplate
>>> ModalContentTransformTemplate().supported_modal
[]
>>> # Note:
>>> #     The property of 'supported_modal' should be inherited by subclasses,
>>> #     and subclasses implement the corresponding modal builders.
>>> #     The current base class does not support any modal builders, so it returns '[]'.
batch(data_list, token_padding_length, **kwargs)[source]

Batch the column data in the output_names.

Parameters:
  • data_list (list) – A list containing multiple data items.

  • token_padding_length (int) – Used to pad the length of "tokens" to ensure that all text data has the same length.

  • kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

A dict. Used to store the batched data.

abstract build_conversation_input_text(raw_inputs, result_recorder)[source]

Used in predict mode, processing the input textual data into a conversation form. Usually inherited and used by quilt class.

Parameters:
  • raw_inputs (str) – input data.

  • result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

Str type. A processed text with conversation form.

build_labels(text_id_list, result_recorder, **kwargs)[source]

Used in training mode, for subclasses to inherit, to construct the labels needed for training from text data.

Parameters:
  • text_id_list (list) – A list containing text data identifiers or indices.

  • result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

  • kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

build_modal_context(input_ids, result_recorder, **kwargs)[source]

According to the requirements of the modal builder, process the input_ids and finally return the processed input_ids.

Parameters:
  • input_ids (list) – input data.

  • result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

  • kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

The processed input_ids.

static get_need_update_output_items(result)[source]

Retrieve the output items that need to be updated.

Parameters:

result (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

A Dict. Defaults to an empty dict.

post_process(output_ids, **kwargs)[source]

Decode the model's output_ids into text strings.

Parameters:
  • output_ids (list) – A list containing the model's output_ids.

  • kwargs (dict, optional) – A variable number of keyword parameters reserved for the keyword parameters to be expanded.

Returns:

A list containing all decoded text strings.

process_predict_query(query_ele_list, result_recorder)[source]

In predict mode, find the corresponding modal builder by traversing and process it.

Parameters:
  • query_ele_list (List[dict]) – A list of elements for predicting a request. For example: [{"image":"/path/to/image"}, {"text":"describe image in English"}].

  • result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

The text results processed by each modal builder.

process_train_item(conversation_list, result_recorder)[source]

In train mode, find the corresponding modal builder by traversing and process it.

Parameters:
  • conversation_list (List[List]) – A list of elements for dialogue data. For example: [["user", "<img>/path/to/image<img>describe image in English:"], ["assistant", "the image describe …."]]

  • result_recorder (DataRecord) – The result data recorder is used to save data that needs to be recorded during the inference process. Values are stored by calling the put method of the DataRecord.

Returns:

The text results processed by each modal builder.

property supported_modal

Used to return the templates supported of modal builder type by an instance.

Returns:

List type, containing the types of modal builder supported by an instance.