mmagic.models.editors.fastcomposer.fastcomposer_util
¶
Module Contents¶
Classes¶
FastComposerModel is based on the StableDiffusion Model and the Clip |
|
TextEncoder for FastComposerModel. |
|
CLIPImageEncoder for FastComposerModel. |
|
Postfuse Module for FastComposerModel. |
|
BalancedL1Loss for object localization. |
|
RandomZoomIn for object transform. |
|
If the height of the image is greater than the width, padding will be |
|
If the height of the image is greater than the width, the image will be |
|
Multilayer Perceptron. |
Functions¶
Get Object transforms. |
|
|
Unet store cross attention scores. |
|
To obtain the average of the loss for each layer of object |
Get object localization loss for one layer. |
|
|
Fuse object embeddings. |
|
The function originally belonged to CLIPTextTransformer, but it has been |
Attributes¶
- class mmagic.models.editors.fastcomposer.fastcomposer_util.FastComposerModel(text_encoder, image_encoder, vae, unet, cfg)[source]¶
Bases:
torch.nn.Module
FastComposerModel is based on the StableDiffusion Model and the Clip Model.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.FastComposerTextEncoder(text_model)[source]¶
Bases:
transformers.CLIPPreTrainedModel
TextEncoder for FastComposerModel.
- static from_pretrained(model_name_or_path, **kwargs)[source]¶
Init textEncoder with Stable Diffusion Model name or path.
- forward(input_ids, image_token_mask=None, object_embeds=None, num_objects=None, attention_mask: Optional[torch.Tensor] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None) Union[Tuple, transformers.modeling_outputs.BaseModelOutputWithPooling] [source]¶
Forward function.
- Parameters
input_ids (torch.Tensor) – You can directly input a
torch.Tensor
.image_token_mask (torch.Tensor) – You can directly input a
torch.Tensor
.object_embeds (torch.Tensor) – You can directly input a
torch.Tensor
.num_objects (torch.Tensor) – You can directly input a
torch.Tensor
.attention_mask (torch.Tensor) – You can directly input a
torch.Tensor
.output_attentions (bool) – Default to None.
output_hidden_states (bool) – Default to None.
return_dict (bool) – Default to None.
- Returns
Union[Tuple, BaseModelOutputWithPooling]
- class mmagic.models.editors.fastcomposer.fastcomposer_util.FastComposerCLIPImageEncoder(vision_model, visual_projection, vision_processor)[source]¶
Bases:
transformers.CLIPPreTrainedModel
CLIPImageEncoder for FastComposerModel.
- mmagic.models.editors.fastcomposer.fastcomposer_util.get_object_transforms(cfg)[source]¶
Get Object transforms.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.FastComposerPostfuseModule(embed_dim)[source]¶
Bases:
torch.nn.Module
Postfuse Module for FastComposerModel.
- fuse_fn(text_embeds, object_embeds)[source]¶
Fuse function.
- Parameters
text_embeds (torch.Tensor) – You can directly input a
torch.Tensor
.object_embeds (torch.Tensor) – You can directly input a
torch.Tensor
.
- Returns
torch.tensor
will be returned.- Return type
torch.Tensor
- forward(text_embeds, object_embeds, image_token_mask, num_objects) torch.Tensor [source]¶
Forward function.
- Parameters
text_embeds (torch.Tensor) – You can directly input a
torch.Tensor
.object_embeds (torch.Tensor) – You can directly input a
torch.Tensor
.image_token_mask (torch.Tensor) – You can directly input a
torch.Tensor
.
- Returns
torch.tensor
will be returned.- Return type
torch.Tensor
- mmagic.models.editors.fastcomposer.fastcomposer_util.unet_store_cross_attention_scores(unet, attention_scores, layers=5)[source]¶
Unet store cross attention scores.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.BalancedL1Loss(threshold=1.0, normalize=False)[source]¶
Bases:
torch.nn.Module
BalancedL1Loss for object localization.
- mmagic.models.editors.fastcomposer.fastcomposer_util.get_object_localization_loss(cross_attention_scores, object_segmaps, image_token_idx, image_token_idx_mask, loss_fn)[source]¶
To obtain the average of the loss for each layer of object localization.
- mmagic.models.editors.fastcomposer.fastcomposer_util.get_object_localization_loss_for_one_layer(cross_attention_scores, object_segmaps, object_token_idx, object_token_idx_mask, loss_fn)[source]¶
Get object localization loss for one layer.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.RandomZoomIn(min_zoom=1.0, max_zoom=1.5)[source]¶
Bases:
torch.nn.Module
RandomZoomIn for object transform.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.PadToSquare(fill=0, padding_mode='constant')[source]¶
Bases:
torch.nn.Module
If the height of the image is greater than the width, padding will be added on both sides of the image to make it a square.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.CropTopSquare[source]¶
Bases:
torch.nn.Module
If the height of the image is greater than the width, the image will be cropped into a square starting from the top of the image.
- class mmagic.models.editors.fastcomposer.fastcomposer_util.MLP(in_dim, out_dim, hidden_dim, use_residual=True)[source]¶
Bases:
torch.nn.Module
Multilayer Perceptron.