`mmagic.models.archs.tokenizer`¶

This a wrapper for tokenizer.

Module Contents¶

Classes¶

TokenizerWrapper

Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer

class mmagic.models.archs.tokenizer.TokenizerWrapper(from_pretrained: Optional[Union[str, os.PathLike]] = None, from_config: Optional[Union[str, os.PathLike]] = None, *args, **kwargs)[source]¶

Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer currently. This wrapper is modified from https://github.com/huggingface/dif fusers/blob/e51f19aee82c8dd874b715a09dbc521d88835d68/src/diffusers/loaders. py#L358 # noqa.

Parameters

from_pretrained (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
from_config (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
*args – If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).
**kwargs –
If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).

__getattr__(name: str) → Any[source]¶

try_adding_tokens(tokens: Union[str, List[str]], *args, **kwargs)[source]¶

Attempt to add tokens to the tokenizer.

Parameters: tokens (Union[str, List[str]]) – The tokens to be added.

get_token_info(token: str) → dict[source]¶

Get the information of a token, including its start and end index in the current tokenizer.

Parameters

token (str) – The token to be queried.

Returns

The information of the token, including its start and end: index in current tokenizer.

Return type

dict

add_placeholder_token(placeholder_token: str, *args, num_vec_per_token: int = 1, **kwargs)[source]¶

Add placeholder tokens to the tokenizer.

Parameters

placeholder_token (str) – The placeholder token to be added.
num_vec_per_token (int, optional) – The number of vectors of the added placeholder token.
*args – The arguments for self.wrapped.add_tokens.
**kwargs –
The arguments for self.wrapped.add_tokens.

replace_placeholder_tokens_in_text(text: Union[str, List[str]], vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0) → Union[str, List[str]][source]¶

Replace the keywords in text with placeholder tokens. This function will be called in self.__call__ and self.encode.

Parameters

text (Union[str, List[str]]) – The text to be processed.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0.

Returns

The processed text.

Return type

Union[str, List[str]]

replace_text_with_placeholder_tokens(text: Union[str, List[str]]) → Union[str, List[str]][source]¶

Replace the placeholder tokens in text with the original keywords. This function will be called in self.decode.

Parameters: text (Union[str, List[str]]) – The text to be processed.
Returns: The processed text.
Return type: Union[str, List[str]]

__call__(text: Union[str, List[str]], *args, vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0, **kwargs)[source]¶

The call function of the wrapper.

Parameters

text (Union[str, List[str]]) – The text to be tokenized.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.

encode(text: Union[str, List[str]], *args, **kwargs)[source]¶

Encode the passed text to token index.

Parameters

text (Union[str, List[str]]) – The text to be encode.
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.

decode(token_ids, return_raw: bool = False, *args, **kwargs) → Union[str, List[str]][source]¶

Decode the token index to text.

Parameters

token_ids – The token index to be decoded.
return_raw – Whether keep the placeholder token in the text. Defaults to False.
*args – The arguments for self.wrapped.decode.
**kwargs –
The arguments for self.wrapped.decode.

Returns

The decoded text.

Return type

Union[str, List[str]]

__repr__()[source]¶: The representation of the wrapper.

mmagic.models.archs.tokenizer¶

Module Contents¶

Classes¶

`mmagic.models.archs.tokenizer`¶