Tensorflow-DCGAN: For more stable training

moonhwan Jeong
4 min readMar 8, 2019

Since Ian Goodfellow’s paper, GAN has been applied to many fields, but its instability has always caused problems. The GAN has to solve the minimax(saddle point) problem, so this problem is intrinsic.

Funny image for saddle point. Image by https://imgur.com/gallery/9nvin

Many researchers attempted to solve these dilemma of GAN through various approache. Among them DCGAN has shown remarkable results. DCGAN proposes a stable GAN network structure. If you designed the model according to the guidelines of the paper, you can see that it is trained stably.

Architecture guidelines for stable Deep Convolutional GANs
• Replace any pooling layers with strided convolutions (discriminator) and
fractional-strided convolutions (generator).
• Use batchnorm in both the generator and the discriminator.
Remove fully connected hidden layers for deeper architectures.
• Use ReLU activation in generator for all layers except for the output, which uses Tanh.
• Use LeakyReLU activation in the discriminator for all layers.

Tutorial

You can show full code from https://github.com/fabulousjeong/dcgan-tensorflow

Generator

Structure of DCGAN generator. Image by Radford’s paper

First, the generator project and reshape noise distribution(100-dim array) to 4x4x1024 feature maps. We used matmul and reshape function to implement that. We then used a series of four fractionally-strided functions convolutions (conv2d_transpose) to progressively creating a 64x64 image. As described above, batchnorm is placed at the end of each layer and relu is used as the activate function except for the output. We used tanh for output, so the pixel range of output is [-1,1].

Discriminator

The Discriminator used in the tutorial has a symmetrical structure with the generator. As opposed to the generator, it trains a feature map by reducing the image size. So we used conv2d with stride size 2. Same as in generator, batchnorm is placed at the end of each layer. But leaky relu is used as the activate function.

Loss Function and Optimizer

loss function and optimizer are same as basic GAN(link). However, according to the section 4(DETAILS OF ADVERSARIAL TRAINING) of the DCGAN paper, we set the learning rate is 0.0002 and bata1 of optimizer is 0.5.

Results

I downloaded a mnist data-set from kaggle. https://www.kaggle.com/scolianni/mnistasjpg

Left: DCGAN, Right: BasicGAN

I trained DCGAN and basicGAN model with mnist dataset. DCGAN generates more clean images than basicGAN. Of course, above result seems reasonable, because DCGAN has a more complex structure than BasicGAN and has many parameters. But the method introduced in DCGAN can learn stably even the model complex. That is the contribution of DCGAN.

Also, I trained both model with Celeb_A dataset(link) which has about 200k portraits photos.

Left: DCGAN, Right: BasicGAN

We trained 16-epochs for DCGAN and BasicGAN models. DCGAN not only produces clear images, but also expresses a variety of features such as glasses, makeup, and mustaches.

I saved the model in each step. Refer to below code and link.

saver = tf.train.Saver()
with tf.Session() as sess:
for epoch in range(total_epoch):
# train model
saver.save(sess, './models/dcgan', global_step=epoch)

We also load a trained model. I made “test.py” to test DCGAN properties.

DCGAN experiment

A random vector is fed to generator as an input. Although the generator takes random values, these values affect the shape of the output face. So, we called the vector as “latent vector”. And even arithmetic operations are possible. Refer to below figure and equation.

(smiley man) - (man) + (woman) = (smiley woman)  

The generator drew smiley woman but cannot draw sunglasses, unfortunately just bruise on eyes instead of that. To express sunglasses, we should need to learn more epochs.

Not only arithmetic operations but also interpolation are possible.

output1 — — — — — — — — — — — — — outputs interpolated z vector — — — — — — — — — — — — — output2

Intermediate 7 pictures were generated by interpolating the latent vectors of left and right pictures. You can see the direction of the head slowly changing. Interestingly, the direction of the hair parting was not changed.

DCGAN learns more stable than the previous method and generates some good results. And we can use it to do some interesting experiments. But the result still looks unnatural and feels like a zombie. In the next story we will discuss about BEGAN which creates a more natural image.

--

--