Shortcuts

mmagic.models.archs.tokenizer

This a wrapper for tokenizer.

Module Contents

Classes

TokenizerWrapper

Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer

class mmagic.models.archs.tokenizer.TokenizerWrapper(from_pretrained: Optional[Union[str, os.PathLike]] = None, from_config: Optional[Union[str, os.PathLike]] = None, *args, **kwargs)[source]

Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer currently. This wrapper is modified from https://github.com/huggingface/dif fusers/blob/e51f19aee82c8dd874b715a09dbc521d88835d68/src/diffusers/loaders. py#L358 # noqa.

Parameters
  • from_pretrained (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.

  • from_config (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.

  • *args – If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).

  • **kwargs

    If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).

__getattr__(name: str) Any[source]
try_adding_tokens(tokens: Union[str, List[str]], *args, **kwargs)[source]

Attempt to add tokens to the tokenizer.

Parameters

tokens (Union[str, List[str]]) – The tokens to be added.

get_token_info(token: str) dict[source]

Get the information of a token, including its start and end index in the current tokenizer.

Parameters

token (str) – The token to be queried.

Returns

The information of the token, including its start and end

index in current tokenizer.

Return type

dict

add_placeholder_token(placeholder_token: str, *args, num_vec_per_token: int = 1, **kwargs)[source]

Add placeholder tokens to the tokenizer.

Parameters
  • placeholder_token (str) – The placeholder token to be added.

  • num_vec_per_token (int, optional) – The number of vectors of the added placeholder token.

  • *args – The arguments for self.wrapped.add_tokens.

  • **kwargs

    The arguments for self.wrapped.add_tokens.

replace_placeholder_tokens_in_text(text: Union[str, List[str]], vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0) Union[str, List[str]][source]

Replace the keywords in text with placeholder tokens. This function will be called in self.__call__ and self.encode.

Parameters
  • text (Union[str, List[str]]) – The text to be processed.

  • vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.

  • prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0.

Returns

The processed text.

Return type

Union[str, List[str]]

replace_text_with_placeholder_tokens(text: Union[str, List[str]]) Union[str, List[str]][source]

Replace the placeholder tokens in text with the original keywords. This function will be called in self.decode.

Parameters

text (Union[str, List[str]]) – The text to be processed.

Returns

The processed text.

Return type

Union[str, List[str]]

__call__(text: Union[str, List[str]], *args, vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0, **kwargs)[source]

The call function of the wrapper.

Parameters
  • text (Union[str, List[str]]) – The text to be tokenized.

  • vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.

  • prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0

  • *args – The arguments for self.wrapped.__call__.

  • **kwargs

    The arguments for self.wrapped.__call__.

encode(text: Union[str, List[str]], *args, **kwargs)[source]

Encode the passed text to token index.

Parameters
  • text (Union[str, List[str]]) – The text to be encode.

  • *args – The arguments for self.wrapped.__call__.

  • **kwargs

    The arguments for self.wrapped.__call__.

decode(token_ids, return_raw: bool = False, *args, **kwargs) Union[str, List[str]][source]

Decode the token index to text.

Parameters
  • token_ids – The token index to be decoded.

  • return_raw – Whether keep the placeholder token in the text. Defaults to False.

  • *args – The arguments for self.wrapped.decode.

  • **kwargs

    The arguments for self.wrapped.decode.

Returns

The decoded text.

Return type

Union[str, List[str]]

__repr__()[source]

The representation of the wrapper.

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.