Shortcuts

unconditional gans

Summary

  • Number of checkpoints: 56

  • Number of configs: 57

  • Number of papers: 9

    • ALGORITHM: 9

StyleGANv3 (NeurIPS’2021)

Task: Unconditional GANs

Abstract

We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

Results and Models

Results (compressed) from StyleGAN3 config-T converted by mmagic

We perform experiments on StyleGANv3 paper settings and also experimental settings. For user convenience, we also offer the converted version of official weights.

Paper Settings

Model Dataset Iter FID50k Download
stylegan3-t ffhq 1024x1024 490000 3.37* ckpt | log
stylegan3-t-ada metface 1024x1024 130000 15.09 ckpt | log

Experimental Settings

Model Dataset Iter FID50k Download
stylegan3-t ffhq 256x256 740000 4.51 ckpt | log
stylegan3-r-ada ffhq 1024x1024 - - ckpt

Converted Weights

Model Dataset Comment FID50k EQ-T EQ-R Download
stylegan3-t ffhqu 256x256 official weight 4.62 63.01 13.12 ckpt
stylegan3-t afhqv2 512x512 official weight 4.04 60.15 13.51 ckpt
stylegan3-t ffhq 1024x1024 official weight 2.79 61.21 13.82 ckpt
stylegan3-r ffhqu 256x256 official weight 4.50 66.65 40.48 ckpt
stylegan3-r afhqv2 512x512 official weight 4.40 64.89 40.34 ckpt
stylegan3-r ffhq 1024x1024 official weight 3.07 64.76 46.62 ckpt

Interpolation

We provide a tool to generate video by walking through GAN’s latent space. Run this command to get the following video.

python apps/interpolate_sample.py configs/styleganv3/stylegan3_t_afhqv2_512_b4x8_official.py https://download.openmmlab.com/mmediting/stylegan3/stylegan3_t_afhqv2_512_b4x8_cvt_official.pkl --export-video --samples-path work_dirs/demos/ --endpoint 6 --interval 60 --space z --seed 2022 --sample-cfg truncation=0.8

https://user-images.githubusercontent.com/22982797/151506918-83da9ee3-0d63-4c5b-ad53-a41562b92075.mp4

Equivarience Visualization && Evaluation

We also provide a tool to visualize the equivarience properties for StyleGAN3. Run these commands to get the results below.

python tools/utils/equivariance_viz.py configs/styleganv3/stylegan3_r_ffhqu_256_b4x8_official.py https://download.openmmlab.com/mmediting/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official.pkl --translate_max 0.5 --transform rotate --seed 5432

python tools/utils/equivariance_viz.py configs/styleganv3/stylegan3_r_ffhqu_256_b4x8_official.py https://download.openmmlab.com/mmediting/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official.pkl --translate_max 0.25 --transform x_t --seed 5432

python tools/utils/equivariance_viz.py configs/styleganv3/stylegan3_r_ffhqu_256_b4x8_official.py https://download.openmmlab.com/mmediting/stylegan3/stylegan3_r_ffhqu_256_b4x8_cvt_official.pkl --translate_max 0.25 --transform y_t --seed 5432

https://user-images.githubusercontent.com/22982797/151504902-f3cbfef5-9014-4607-bbe1-deaf48ec6d55.mp4

https://user-images.githubusercontent.com/22982797/151504973-b96e1639-861d-434b-9d7c-411ebd4a653f.mp4

https://user-images.githubusercontent.com/22982797/151505099-cde4999e-aab1-42d4-a458-3bb069db3d32.mp4

If you want to get EQ-Metric for StyleGAN3, just add following codes into config.

metrics = dict(
    eqv=dict(
        type='Equivariance',
        num_images=50000,
        eq_cfg=dict(
            compute_eqt_int=True, compute_eqt_frac=True, compute_eqr=True)))

And we highly recommend you to use slurm_test.sh script to accelerate evaluation time.

Citation

@inproceedings{Karras2021,
  author = {Tero Karras and Miika Aittala and Samuli Laine and Erik H\"ark\"onen and Janne Hellsten and Jaakko Lehtinen and Timo Aila},
  title = {Alias-Free Generative Adversarial Networks},
  booktitle = {Proc. NeurIPS},
  year = {2021}
}

Positional Encoding in GANs (CVPR’2021)

Task: Unconditional GANs

Abstract

SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators. Such positional encoding is indispensable for generating images with high fidelity. The same phenomenon is observed in other generative architectures such as DCGAN and PGGAN. We further show that zero padding leads to an unbalanced spatial bias with a vague relation between locations. To offer a better spatial inductive bias, we investigate alternative positional encodings and analyze their effects. Based on a more flexible positional encoding explicitly, we propose a new multi-scale training strategy and demonstrate its effectiveness in the state-of-the-art unconditional generator StyleGAN2. Besides, the explicit spatial inductive bias substantially improve SinGAN for more versatile image manipulation.

Results and models for MS-PIE

896x896 results generated from a 256 generator using MS-PIE
Model Dataset Reference in Paper Scales FID50k P&R10k Download
stylegan2_c2_8xb3-1100kiters_ffhq-256x256 FFHQ Tab.5 config-a 256 5.56 75.92/51.24 model
stylegan2_c2_8xb3-1100kiters_ffhq-512x512 FFHQ Tab.5 config-b 512 4.91 75.65/54.58 model
mspie-stylegan2-config-c_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-c 256, 384, 512 3.35 73.84/55.77 model
mspie-stylegan2-config-d_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-d 256, 384, 512 3.50 73.28/56.16 model
mspie-stylegan2-config-e_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-e 256, 384, 512 3.15 74.13/56.88 model
mspie-stylegan2-config-f_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-f 256, 384, 512 2.93 73.51/57.32 model
mspie-stylegan2-config-g_c1_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-g 256, 384, 512 3.40 73.05/56.45 model
mspie-stylegan2-config-h_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-h 256, 384, 512 4.01 72.81/54.35 model
mspie-stylegan2-config-i_c2_8xb3-1100kiters_ffhq-256-512 FFHQ 2 Tab.5 config-i 56, 384, 512 3.76 73.26/54.71 model
mspie-stylegan2-config-j_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-j 256, 384, 512 4.23 73.11/54.63 model
mspie-stylegan2-config-k_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-k 256, 384, 512 4.17 73.05/51.07 model
mspie-stylegan2-config-f_c2_8xb3-1100kiters_ffhq-256-896 FFHQ higher-resolution 256, 512, 896 4.10 72.21/50.29 model
mspie-stylegan2-config-f_c1_8xb2-1600kiters_ffhq-256-1024 FFHQ higher-resolution 256, 512, 1024 6.24 71.79/49.92 model
Model Dataset Reference in Paper Scales FID50k Precision10k Recall10k Download
stylegan2_c2_8xb3-1100kiters_ffhq-256x256 FFHQ Tab.5 config-a 256 5.56 75.92 51.24 model
stylegan2_c2_8xb3-1100kiters_ffhq-512x512 FFHQ Tab.5 config-b 512 4.91 75.65 54.58 model
mspie-stylegan2-config-c_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-c 256, 384, 512 3.35 73.84 55.77 model
mspie-stylegan2-config-d_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-d 256, 384, 512 3.50 73.28 56.16 model
mspie-stylegan2-config-e_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-e 256, 384, 512 3.15 74.13 56.88 model
mspie-stylegan2-config-f_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-f 256, 384, 512 2.93 73.51 57.32 model
mspie-stylegan2-config-g_c1_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-g 256, 384, 512 3.40 73.05 56.45 model
mspie-stylegan2-config-h_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-h 256, 384, 512 4.01 72.81 54.35 model
mspie-stylegan2-config-i_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-i 256, 384, 512 3.76 73.26 54.71 model
mspie-stylegan2-config-j_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-j 256, 384, 512 4.23 73.11 54.63 model
mspie-stylegan2-config-k_c2_8xb3-1100kiters_ffhq-256-512 FFHQ Tab.5 config-k 256, 384, 512 4.17 73.05 51.07 model
mspie-stylegan2-config-f_c2_8xb3-1100kiters_ffhq-256-896 FFHQ higher-resolution 256, 512, 896 4.10 72.21 50.29 model
mspie-stylegan2-config-f_c1_8xb2-1600kiters_ffhq-256-1024 FFHQ higher-resolution 256, 512, 1024 6.24 71.79 49.92 model

Note that we report the FID and P&R metric (FFHQ dataset) in the largest scale.

Citation

@article{xu2020positional,
  title={Positional Encoding as Spatial Inductive Bias in GANs},
  author={Xu, Rui and Wang, Xintao and Chen, Kai and Zhou, Bolei and Loy, Chen Change},
  journal={arXiv preprint arXiv:2012.05217},
  year={2020},
  url={https://openaccess.thecvf.com/content/CVPR2021/html/Xu_Positional_Encoding_As_Spatial_Inductive_Bias_in_GANs_CVPR_2021_paper.html},
}

StyleGANv2 (CVPR’2020)

Task: Unconditional GANs

Abstract

The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Results and Models

Results (compressed) from StyleGAN2 config-f trained by mmagic
Model Dataset Comment FID50k Precision50k Recall50k Download
stylegan2_c2_8xb4_ffhq-1024x1024 FFHQ official weight 2.8134 62.856 49.400 model
stylegan2_c2_8xb4_lsun-car-384x512 LSUN_CAR official weight 5.4316 65.986 48.190 model
stylegan2_c2_8xb4-800kiters_lsun-horse-256x256 LSUN_HORSE official weight - - - model
stylegan2_c2_8xb4-800kiters_lsun-church-256x256 LSUN_CHURCH official weight - - - model
stylegan2_c2_8xb4-800kiters_lsun-cat-256x256 LSUN_CAT official weight - - - model
stylegan2_c2_8xb4-800kiters_ffhq-256x256 FFHQ our training 3.992 69.012 40.417 model
stylegan2_c2_8xb4_ffhq-1024x1024 FFHQ our training 2.8185 68.236 49.583 model
stylegan2_c2_8xb4_lsun-car-384x512 LSUN_CAR our training 2.4116 66.760 50.576 model

FP16 Support and Experiments

Currently, we have supported FP16 training for StyleGAN2, and here are the results for the mixed-precision training. (Experiments for FFHQ1024 will come soon.)

Evaluation FID for FP32 and FP16 training

As shown in the figure, we provide 3 ways to do mixed-precision training for StyleGAN2:

  • stylegan2_c2_fp16_PL-no-scaler: In this setting, we try our best to follow the official FP16 implementation in StyleGAN2-ADA. Similar to the official version, we only adopt FP16 training for the higher-resolution feature maps (the last 4 stages in G and the first 4 stages). Note that we do not adopt the clamp way to avoid gradient overflow used in the official implementation. We use the autocast function from torch.cuda.amp package.

  • stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler: In this config, we try to adopt mixed-precision training for the whole generator, but in partial discriminator (the first 4 higher-resolution stages). Note that we do not apply the loss scaler in the path length loss and gradient penalty loss. Because we always meet divergence after adopting the loss scaler to scale the gradient in these two losses.

  • stylegan2_c2_apex_fp16_PL-R1-no-scaler: In this setting, we adopt the APEX toolkit to implement mixed-precision training with multiple loss/gradient scalers. In APEX, you can assign different loss scalers for the generator and the discriminator respectively. Note that we still ignore the gradient scaler in the path length loss and gradient penalty loss.

Model Comment Dataset FID50k Download
stylegan2_c2_8xb4-800kiters_ffhq-256x256 baseline FFHQ256 3.992 ckpt
stylegan2_c2-PL_8xb4-fp16-partial-GD-no-scaler-800kiters_ffhq-256x256 partial layers in fp16 FFHQ256 4.331 ckpt
stylegan2_c2-PL-R1_8xb4-fp16-globalG-partialD-no-scaler-800kiters_ffhq-256x256 the whole G in fp16 FFHQ256 4.362 ckpt
stylegan2_c2-PL-R1_8xb4-apex-fp16-no-scaler-800kiters_ffhq-256x256 the whole G&D in fp16 + two loss scaler FFHQ256 4.614 ckpt

As shown in this table, P&R50k_full is the metric used in StyleGANv1 and StyleGANv2. full indicates that we use the whole dataset for extracting the real distribution, e.g., 70000 images in FFHQ dataset. However, adopting the VGG16 provided from Tero requires that your PyTorch version must fulfill >=1.6.0. Be careful about using the PyTorch’s VGG16 to extract features, which will cause higher precision and recall.

Citation

@inproceedings{karras2020analyzing,
  title={Analyzing and improving the image quality of stylegan},
  author={Karras, Tero and Laine, Samuli and Aittala, Miika and Hellsten, Janne and Lehtinen, Jaakko and Aila, Timo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8110--8119},
  year={2020},
  url={https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html},
}

StyleGANv1 (CVPR’2019)

Task: Unconditional GANs

Abstract

We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

Results and Models

Results (compressed) from StyleGANv1 trained by mmagic
Model Dataset FID50k P&R50k_full Download
styleganv1_ffhq_256 FFHQ 6.090 70.228/27.050 model
styleganv1_ffhq_1024 FFHQ 4.056 70.302/36.869 model

Citation

@inproceedings{karras2019style,
  title={A style-based generator architecture for generative adversarial networks},
  author={Karras, Tero and Laine, Samuli and Aila, Timo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4401--4410},
  year={2019},
  url={https://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html},
}

PGGAN (ICLR’2018)

Task: Unconditional GANs

Abstract

We describe a new training methodology for generative adversarial networks. The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality, e.g., CelebA images at 1024^2. We also propose a simple way to increase the variation in generated images, and achieve a record inception score of 8.80 in unsupervised CIFAR10. Additionally, we describe several implementation details that are important for discouraging unhealthy competition between the generator and discriminator. Finally, we suggest a new metric for evaluating GAN results, both in terms of image quality and variation. As an additional contribution, we construct a higher-quality version of the CelebA dataset.

Results and models

Results (compressed) from our PGGAN trained in CelebA-HQ@1024
Model Dataset MS-SSIM SWD(xx,xx,xx,xx/avg) Download
pggan_128x128 celeba-cropped 0.3023 3.42, 4.04, 4.78, 20.38/8.15 model
pggan_128x128 lsun-bedroom 0.0602 3.5, 2.96, 2.76, 9.65/4.72 model
pggan_1024x1024 celeba-hq 0.3379 8.93, 3.98, 3.07, 2.64/4.655 model

Citation

PGGAN (arXiv'2017)
@article{karras2017progressive,
  title={Progressive growing of gans for improved quality, stability, and variation},
  author={Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko},
  journal={arXiv preprint arXiv:1710.10196},
  year={2017},
  url={https://arxiv.org/abs/1710.10196},
}

GGAN (ArXiv’2017)

Task: Unconditional GANs

Abstract

Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN.

Results and models

GGAN 64x64, CelebA-Cropped
Model Dataset SWD MS-SSIM FID Download
GGAN 64x64 CelebA-Cropped 11.18, 12.21, 39.16/20.85 0.3318 20.1797 model | log
GGAN 128x128 CelebA-Cropped 9.81, 11.29, 19.22, 47.79/22.03 0.3149 18.7647 model | log
GGAN 64x64 LSUN-Bedroom 9.1, 6.2, 12.27/9.19 0.0649 39.9261 model | log

Note: In the original implementation of GGAN, they set G_iters to 10. However our framework does not support G_iters currently, so we dropped the settings in the original implementation and conducted several experiments with our own settings. We have shown above the experiment results with the lowest fid score.
Original settings and our settings:

Model Dataset Architecture optimizer lr_G lr_D G_iters D_iters
GGAN(origin) 64x64 CelebA-Cropped dcgan-archi RMSprop 0.0002 0.0002 10 1
GGAN(ours) 64x64 CelebA-Cropped dcgan-archi Adam 0.001 0.001 1 1
GGAN(origin) 64x64 LSUN-Bedroom dcgan-archi RMSprop 0.0002 0.0002 10 1
GGAN(ours) 64x64 LSUN-Bedroom lsgan-archi Adam 0.0001 0.0001 1 1

Citation

@article{lim2017geometric,
  title={Geometric gan},
  author={Lim, Jae Hyun and Ye, Jong Chul},
  journal={arXiv preprint arXiv:1705.02894},
  year={2017},
  url={https://arxiv.org/abs/1705.02894},
}

WGAN-GP (NeurIPS’2017)

Task: Unconditional GANs

Abstract

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

Results and models

WGAN-GP 128, CelebA-Cropped
Model Dataset Details SWD MS-SSIM Download
WGAN-GP 128 CelebA-Cropped GN 5.87, 9.76, 9.43, 18.84/10.97 0.2601 model
WGAN-GP 128 LSUN-Bedroom GN, GP-lambda = 50 11.7, 7.87, 9.82, 25.36/13.69 0.059 model

Citation

@article{gulrajani2017improved,
  title={Improved Training of Wasserstein GANs},
  author={Gulrajani, Ishaan and Ahmed, Faruk and Arjovsky, Martin and Dumoulin, Vincent and Courville, Aaron},
  journal={arXiv preprint arXiv:1704.00028},
  year={2017},
  url={https://arxiv.org/abs/1704.00028},
}

LSGAN (ICCV’2017)

Task: Unconditional GANs

Abstract

Unsupervised learning with generative adversarial networks (GANs) has proven hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss function for the discriminator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson χ2 divergence. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stable during the learning process. We evaluate LSGANs on five scene datasets and the experimental results show that the images generated by LSGANs are of better quality than the ones generated by regular GANs. We also conduct two comparison experiments between LSGANs and regular GANs to illustrate the stability of LSGANs.

Results and models

LSGAN 64x64, CelebA-Cropped
Model Dataset SWD MS-SSIM FID Download
LSGAN 64x64 CelebA-Cropped 6.16, 6.83, 37.64/16.87 0.3216 11.9258 model| log
LSGAN 64x64 LSUN-Bedroom 5.66, 9.0, 18.6/11.09 0.0671 30.7390 model| log
LSGAN 128x128 CelebA-Cropped 21.66, 9.83, 16.06, 70.76/29.58 0.3691 38.3752 model| log
LSGAN 128x128 LSUN-Bedroom 19.52, 9.99, 7.48, 14.3/12.82 0.0612 51.5500 model| log

Citation

@inproceedings{mao2017least,
  title={Least squares generative adversarial networks},
  author={Mao, Xudong and Li, Qing and Xie, Haoran and Lau, Raymond YK and Wang, Zhen and Paul Smolley, Stephen},
  booktitle={Proceedings of the IEEE international conference on computer vision},
  pages={2794--2802},
  year={2017},
  url={https://openaccess.thecvf.com/content_iccv_2017/html/Mao_Least_Squares_Generative_ICCV_2017_paper.html},
}

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (ICLR’2016)

Task: Unconditional GANs

Abstract

In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

Results and models

DCGAN 64x64, CelebA-Cropped
Model Dataset SWD MS-SSIM Download
DCGAN 64x64 MNIST (64x64) 21.16, 4.4, 8.41/11.32 0.1395 model | log
DCGAN 64x64 CelebA-Cropped 8.93,10.53,50.32/23.26 0.2899 model | log
DCGAN 64x64 LSUN-Bedroom 42.79, 34.55, 98.46/58.6 0.2095 model | log

Citation

@article{radford2015unsupervised,
  title={Unsupervised representation learning with deep convolutional generative adversarial networks},
  author={Radford, Alec and Metz, Luke and Chintala, Soumith},
  journal={arXiv preprint arXiv:1511.06434},
  year={2015},
  url={https://arxiv.org/abs/1511.06434},
}
Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.