InsetGAN for Full-Body Image Generation

Anna Frühstück 1, 2, Krishna Kumar Singh 2, Eli Shechtman 2,
Niloy J. Mitra 2, 3, Peter Wonka 1 and Jingwan (Cynthia) Lu 2

KAUST   Adobe Research   University College London

CVPR, 2022

InsetGAN for Full-Body Image Generation

Abstract

While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs where one GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts (e.g., faces, hands) that can be seamlessly inserted onto the global canvas. We model the problem as jointly exploring the respective latent spaces such that the generated images can be combined, by inserting the parts from the specialized generators onto the global canvas, without introducing seams. We demonstrate the setup by combing a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans. We evaluate our results with quantitative metrics and user studies.

Paper

arXiv page

Paper

Supplementary Materials

@inproceedings{Fruehstueck2022InsetGAN,
  title = {InsetGAN for Full-Body Image Generation},
  author = {Fr{\"u}hst{\"u}ck, Anna and Singh, {Krishna Kumar} and Shechtman, Eli and Mitra, {Niloy J.} and Wonka, Peter and Lu, Jingwan},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2022},
  pages = {7723-7732}
}

Our Code is available on Github

Paper Video

InsetGAN results

We show a comparison of several examples of StyleGAN2-generated full-body humans. We concentrate on regions that often exhibit unwanted artifacts in our generated results. Using our InsetGAN method, we are able to generate both faces and shoes using dedicated models and generate appropriate bodies for the respective combination. The result yields a seamless transition between the output of the three distinct generator models.

InsetGAN Pipeline

We show a diagram of the pipeline of our InsetGAN optimization process.

InsetGAN Pipeline

Face+Body Combination Optimization

We can choose to optimize only one of our generator networks, the inset (left) and optimize for coherence with the canvas, however we see that using this strategy, we cannot sufficiently adapt to desired features from the inset (blond hair) and achieve equally good global coherence as when we jointly optimize both canvas and inset (right).

Single GAN optimization

Dual GAN optimization

Latent Space Walks

We show joint latent space walks through two generators, demonstrating that our method can achieve excellent overall image coherence for many different face/body combinations.