mmagic.models.archs.tokenizer
¶
This a wrapper for tokenizer.
Module Contents¶
Classes¶
Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer |
- class mmagic.models.archs.tokenizer.TokenizerWrapper(from_pretrained: Optional[Union[str, os.PathLike]] = None, from_config: Optional[Union[str, os.PathLike]] = None, *args, **kwargs)[source]¶
Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer currently. This wrapper is modified from https://github.com/huggingface/dif fusers/blob/e51f19aee82c8dd874b715a09dbc521d88835d68/src/diffusers/loaders. py#L358 # noqa.
- Parameters
from_pretrained (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
from_config (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
*args – If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).
**kwargs –
If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).
- try_adding_tokens(tokens: Union[str, List[str]], *args, **kwargs)[source]¶
Attempt to add tokens to the tokenizer.
- Parameters
tokens (Union[str, List[str]]) – The tokens to be added.
- get_token_info(token: str) dict [source]¶
Get the information of a token, including its start and end index in the current tokenizer.
- Parameters
token (str) – The token to be queried.
- Returns
- The information of the token, including its start and end
index in current tokenizer.
- Return type
dict
- add_placeholder_token(placeholder_token: str, *args, num_vec_per_token: int = 1, **kwargs)[source]¶
Add placeholder tokens to the tokenizer.
- Parameters
placeholder_token (str) – The placeholder token to be added.
num_vec_per_token (int, optional) – The number of vectors of the added placeholder token.
*args – The arguments for self.wrapped.add_tokens.
**kwargs –
The arguments for self.wrapped.add_tokens.
- replace_placeholder_tokens_in_text(text: Union[str, List[str]], vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0) Union[str, List[str]] [source]¶
Replace the keywords in text with placeholder tokens. This function will be called in self.__call__ and self.encode.
- Parameters
text (Union[str, List[str]]) – The text to be processed.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0.
- Returns
The processed text.
- Return type
Union[str, List[str]]
- replace_text_with_placeholder_tokens(text: Union[str, List[str]]) Union[str, List[str]] [source]¶
Replace the placeholder tokens in text with the original keywords. This function will be called in self.decode.
- Parameters
text (Union[str, List[str]]) – The text to be processed.
- Returns
The processed text.
- Return type
Union[str, List[str]]
- __call__(text: Union[str, List[str]], *args, vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0, **kwargs)[source]¶
The call function of the wrapper.
- Parameters
text (Union[str, List[str]]) – The text to be tokenized.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.
- encode(text: Union[str, List[str]], *args, **kwargs)[source]¶
Encode the passed text to token index.
- Parameters
text (Union[str, List[str]]) – The text to be encode.
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.
- decode(token_ids, return_raw: bool = False, *args, **kwargs) Union[str, List[str]] [source]¶
Decode the token index to text.
- Parameters
token_ids – The token index to be decoded.
return_raw – Whether keep the placeholder token in the text. Defaults to False.
*args – The arguments for self.wrapped.decode.
**kwargs –
The arguments for self.wrapped.decode.
- Returns
The decoded text.
- Return type
Union[str, List[str]]