InsetGAN for Full-Body Image Generation
Anna Frühstück 1, 2, Krishna Kumar Singh 2, Eli Shechtman 2,
Niloy J. Mitra 2, 3, Peter Wonka 1 and Jingwan (Cynthia) Lu 2
1 KAUST 2 Adobe Research 3 University College London
CVPR, 2022
Abstract
While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs where one GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts (e.g., faces, hands) that can be seamlessly inserted onto the global canvas. We model the problem as jointly exploring the respective latent spaces such that the generated images can be combined, by inserting the parts from the specialized generators onto the global canvas, without introducing seams. We demonstrate the setup by combing a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans. We evaluate our results with quantitative metrics and user studies.
Paper
@inproceedings{Fruehstueck2022InsetGAN,
title = {InsetGAN for Full-Body Image Generation},
author = {Fr{\"u}hst{\"u}ck, Anna and Singh, {Krishna Kumar} and Shechtman, Eli and Mitra, {Niloy J.} and Wonka, Peter and Lu, Jingwan},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {7723-7732}
}
Our Code is available on Github
Paper Video
InsetGAN results
We show a comparison of several examples of StyleGAN2-generated full-body humans. We concentrate on regions that often exhibit unwanted artifacts in our generated results. Using our InsetGAN method, we are able to generate both faces and shoes using dedicated models and generate appropriate bodies for the respective combination. The result yields a seamless transition between the output of the three distinct generator models.
InsetGAN Pipeline
We show a diagram of the pipeline of our InsetGAN optimization process.
Face+Body Combination Optimization
We can choose to optimize only one of our generator networks, the inset (left) and optimize for coherence with the canvas, however we see that using this strategy, we cannot sufficiently adapt to desired features from the inset (blond hair) and achieve equally good global coherence as when we jointly optimize both canvas and inset (right).
Single GAN optimization
Dual GAN optimization
Latent Space Walks
We show joint latent space walks through two generators, demonstrating that our method can achieve excellent overall image coherence for many different face/body combinations.