Are GANs Biased? Evaluating GAN-Generated Facial Images via Crowdsourcing

Abstract

Generative models produce astonishingly high-resolution and realistic facial images. However, reliably evaluating the quality of these images remains challenging, not to mention performing a systematic investigation of the potential biases in generative adversarial models (GAN). In this paper, we argue that crowdsourcing can be used to measure the biases in GAN quantitatively. We showcase an investigation that examines whether GAN-generated facial images with darker skin tones are of worse quality. We ask crowd workers to guess whether the image is real or fake, and use this as a proxy metric for estimating the quality of facial images generated by state-of-the-art GANs. The results show preliminary evidence that GANs can generate worse quality images with darker skin tones than images with lighter skin tones. More research is needed to understand the sources, effects, and generalizability of this observed phenomenon.

Publication
Neurips 2022 Workshop on Human Evaluation of Generative Models