conditional gans¶

Summary¶

Number of checkpoints: 18
Number of configs: 18
Number of papers: 3
- ALGORITHM: 3

BigGAN (ICLR’2019)¶

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Task: Conditional GANs

Abstract¶

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple “truncation trick,” allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator’s input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.

Introduction¶

The BigGAN/BigGAN-Deep is a conditional generation model that can generate both high-resolution and high-quality images by scaling up the batch size and the number of model parameters.

We have finished training BigGAN in Cifar10 (32x32) and are aligning training performance in ImageNet1k (128x128). Some sampled results are shown below for your reference.

Results from our BigGAN trained in CIFAR10

Results from our BigGAN trained in ImageNet

Evaluation of our trained BigGAN.

Model	Dataset	FID (Iter)	IS (Iter)	Download
BigGAN 32x32	CIFAR10	9.78(390000)	8.70(390000)	model\|log
BigGAN 128x128 Best FID	ImageNet1k	8.69(1232000)	101.15(1232000)	model\|log
BigGAN 128x128 Best IS	ImageNet1k	13.51(1328000)	129.07(1328000)	model\|log

Note on reproducibility¶

BigGAN 128x128 model is trained with V100 GPUs and CUDA 10.1 and can hardly reproduce the result with A100 and CUDA 11.3. If you have any idea about the reproducibility, please feel free to contact with us.

Converted weights¶

Since we haven’t finished training our models, we provide you with several pre-trained weights which have been evaluated. Here, we refer to BigGAN-PyTorch and pytorch-pretrained-BigGAN.

Evaluation results and download links are provided below.

Model	Dataset	FID	IS	Download	Original Download link
BigGAN 128x128	ImageNet1k	10.1414	96.728	model	link
BigGAN-Deep 128x128	ImageNet1k	5.9471	107.161	model	link
BigGAN-Deep 256x256	ImageNet1k	11.3151	135.107	model	link
BigGAN-Deep 512x512	ImageNet1k	16.8728	124.368	model	link

Sampling results are shown below.

Results from our BigGAN-Deep with Pre-trained weights in ImageNet 128x128 with truncation factor 0.4

Results from our BigGAN-Deep with Pre-trained weights in ImageNet 256x256 with truncation factor 0.4

Results from our BigGAN-Deep with Pre-trained weights in ImageNet 512x512 truncation factor 0.4

Sampling with truncation trick above can be performed by command below.

python demo/conditional_demo.py CONFIG_PATH CKPT_PATH --sample-cfg truncation=0.4 ## set truncation value as you want

For converted weights, we provide model configs under configs/_base_/models listed as follows:

## biggan_cvt-BigGAN-PyTorch-rgb_imagenet1k-128x128.py
## biggan-deep_cvt-hugging-face-rgb_imagenet1k-128x128.py
## biggan-deep_cvt-hugging-face_rgb_imagenet1k-256x256.py
## biggan-deep_cvt-hugging-face_rgb_imagenet1k-512x512.py

Interpolation¶

To perform image Interpolation on BigGAN(or other conditional models), run

python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH

Image interpolating Results of our BigGAN-Deep

To perform image Interpolation on BigGAN with fixed noise, run

python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH --fix-z

Image interpolating Results of our BigGAN-Deep with fixed noise

To perform image Interpolation on BigGAN with fixed label, run

python apps/conditional_interpolate.py CONFIG_PATH  CKPT_PATH  --samples-path SAMPLES_PATH --fix-y

Image interpolating Results of our BigGAN-Deep with fixed label

Citation¶

@inproceedings{
    brock2018large,
    title={Large Scale {GAN} Training for High Fidelity Natural Image Synthesis},
    author={Andrew Brock and Jeff Donahue and Karen Simonyan},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=B1xsqj09Fm},
}

SAGAN (ICML’2019)¶

Self-attention generative adversarial networks

Task: Conditional GANs

Abstract¶

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN performs better than prior work, boosting the best published Inception score from 36.8 to 52.52 and reducing Fréchet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

Results and models¶

Results from our SAGAN trained in CIFAR10

Model	Dataset	Inplace ReLU	dist_step	Total Batchsize (BZ_PER_GPU * NGPU)	Total Iters*	Iter	IS	FID	Download
SAGAN-32x32-woInplaceReLU Best IS	CIFAR10	w/o	5	64x1	500000	400000	9.3217	10.5030	model \| Log
SAGAN-32x32-woInplaceReLU Best FID	CIFAR10	w/o	5	64x1	500000	480000	9.3174	9.4252	model \| Log
SAGAN-32x32-wInplaceReLU Best IS	CIFAR10	w	5	64x1	500000	380000	9.2286	11.7760	model \| Log
SAGAN-32x32-wInplaceReLU Best FID	CIFAR10	w	5	64x1	500000	460000	9.2061	10.7781	model \| Log
SAGAN-128x128-woInplaceReLU Best IS	ImageNet	w/o	1	64x4	1000000	980000	31.5938	36.7712	model \| Log
SAGAN-128x128-woInplaceReLU Best FID	ImageNet	w/o	1	64x4	1000000	950000	28.4936	34.7838	model \| Log
SAGAN-128x128-BigGAN Schedule Best IS	ImageNet	w/o	1	32x8	1000000	826000	69.5350	12.8295	model \| Log
SAGAN-128x128-BigGAN Schedule Best FID	ImageNet	w/o	1	32x8	1000000	826000	69.5350	12.8295	model \| Log

‘*’ Iteration counting rule in our implementation is different from others. If you want to align with other codebases, you can use the following conversion formula:

total_iters (biggan/pytorch studio gan) = our_total_iters / dist_step

We also provide converted pre-train models from Pytorch-StudioGAN. To be noted that, in Pytorch Studio GAN, inplace ReLU is used in generator and discriminator.

Model	Dataset	Inplace ReLU	n_disc	Total Iters	IS (Our Pipeline)	FID (Our Pipeline)	IS (StudioGAN)	FID (StudioGAN)	Download	Original Download link
SAGAN-32x32 StudioGAN	CIFAR10	w	5	100000	9.116	10.2011	8.680	14.009	model	model
SAGAN0-128x128 StudioGAN	ImageNet	w	1	1000000	27.367	40.1162	29.848	34.726	model	model

Our Pipeline denote results evaluated with our pipeline.
StudioGAN denote results released by Pytorch-StudioGAN.

For IS metric, our implementation is different from PyTorch-Studio GAN in the following aspects:

We use Tero’s Inception for feature extraction.
We use bicubic interpolation with PIL backend to resize image before feed them to Inception.

For FID evaluation, we follow the pipeline of BigGAN, where the whole training set is adopted to extract inception statistics, and Pytorch Studio GAN uses 50000 randomly selected samples. Besides, we also use Tero’s Inception for feature extraction.

You can download the preprocessed inception state by the following url: CIFAR10 and ImageNet1k.

You can use following commands to extract those inception states by yourself.

## For CIFAR10
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/cifar10_inception_stat.py --pklname cifar10.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

## For ImageNet1k
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/imagenet_128x128_inception_stat.py --pklname imagenet.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

Citation¶

@inproceedings{zhang2019self,
  title={Self-attention generative adversarial networks},
  author={Zhang, Han and Goodfellow, Ian and Metaxas, Dimitris and Odena, Augustus},
  booktitle={International conference on machine learning},
  pages={7354--7363},
  year={2019},
  organization={PMLR},
  url={https://proceedings.mlr.press/v97/zhang19d.html},
}

SNGAN (ICLR’2018)¶

Spectral Normalization for Generative Adversarial Networks

Task: Conditional GANs

Abstract¶

One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

Results and models¶

Results from our SNGAN-PROJ trained in CIFAR10 and ImageNet

Model	Dataset	Inplace ReLU	disc_step	Total Iters*	Iter	IS	FID	Download
SNGAN_Proj-32x32-woInplaceReLU Best IS	CIFAR10	w/o	5	500000	400000	9.6919	9.8203	ckpt \| Log
SNGAN_Proj-32x32-woInplaceReLU Best FID	CIFAR10	w/o	5	500000	490000	9.5659	8.1158	ckpt \| Log
SNGAN_Proj-32x32-wInplaceReLU Best IS	CIFAR10	w	5	500000	490000	9.5564	8.3462	ckpt \| Log
SNGAN_Proj-32x32-wInplaceReLU Best FID	CIFAR10	w	5	500000	490000	9.5564	8.3462	ckpt \| Log
SNGAN_Proj-128x128-woInplaceReLU Best IS	ImageNet	w/o	5	1000000	952000	30.0651	33.4682	ckpt \| Log
SNGAN_Proj-128x128-woInplaceReLU Best FID	ImageNet	w/o	5	1000000	989000	29.5779	32.6193	ckpt \| Log
SNGAN_Proj-128x128-wInplaceReLU Best IS	ImageNet	w	5	1000000	944000	28.1799	34.3383	ckpt \| Log
SNGAN_Proj-128x128-wInplaceReLU Best FID	ImageNet	w	5	1000000	988000	27.7948	33.4821	ckpt \| Log

‘*’ Iteration counting rule in our implementation is different from others. If you want to align with other codebases, you can use the following conversion formula:

total_iters (biggan/pytorch studio gan) = our_total_iters / disc_step

We also provide converted pre-train models from Pytorch-StudioGAN. To be noted that, in Pytorch Studio GAN, inplace ReLU is used in generator and discriminator.

Model	Dataset	Inplace ReLU	disc_step	Total Iters	IS (Our Pipeline)	FID (Our Pipeline)	IS (StudioGAN)	FID (StudioGAN)	Download	Original Download link
SAGAN_Proj-32x32 StudioGAN	CIFAR10	w	5	100000	9.372	10.2011	8.677	13.248	model	model
SAGAN_Proj-128x128 StudioGAN	ImageNet	w	2	1000000	30.218	29.8199	32.247	26.792	model	model

Our Pipeline denote results evaluated with our pipeline.
StudioGAN denote results released by Pytorch-StudioGAN.

For IS metric, our implementation is different from PyTorch-Studio GAN in the following aspects:

We use Tero’s Inception for feature extraction.
We use bicubic interpolation with PIL backend to resize image before feed them to Inception.

For FID evaluation, we follow the pipeline of BigGAN, where the whole training set is adopted to extract inception statistics, and Pytorch Studio GAN uses 50000 randomly selected samples. Besides, we also use Tero’s Inception for feature extraction.

You can download the preprocessed inception state by the following url: CIFAR10 and ImageNet1k.

You can use following commands to extract those inception states by yourself.

## For CIFAR10
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/cifar10_inception_stat.py --pklname cifar10.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

## For ImageNet1k
python tools/utils/inception_stat.py --data-cfg configs/_base_/datasets/imagenet_128x128_inception_stat.py --pklname imagenet.pkl --no-shuffle --inception-style stylegan --num-samples -1 --subset train

Citation¶

@inproceedings{miyato2018spectral,
  title={Spectral Normalization for Generative Adversarial Networks},
  author={Miyato, Takeru and Kataoka, Toshiki and Koyama, Masanori and Yoshida, Yuichi},
  booktitle={International Conference on Learning Representations},
  year={2018},
  url={https://openreview.net/forum?id=B1QRgziT-},
}