Authors:
(1) Wenxuan Wang, The Chinese University of Hong Kong, Hong Kong, China;
(2) Haonan Bai, The Chinese University of Hong Kong, Hong Kong, China
(3) Jen-tse Huang, The Chinese University of Hong Kong, Hong Kong, China;
(4) Yuxuan Wan, The Chinese University of Hong Kong, Hong Kong, China;
(5) Youliang Yuan, The Chinese University of Hong Kong, Shenzhen Shenzhen, China
(6) Haoyi Qiu University of California, Los Angeles, Los Angeles, USA;
(7) Nanyun Peng, University of California, Los Angeles, Los Angeles, USA
(8) Michael Lyu, The Chinese University of Hong Kong, Hong Kong, China.
Table of Links
3.1 Seed Image Collection and 3.2 Neutral Prompt List Collection
3.3 Image Generation and 3.4 Properties Assessment
4.2 RQ1: Effectiveness of BiasPainter
4.3 RQ2 - Validity of Identified Biases
7 Conclusion, Data Availability, and References
4.2 RQ1: Effectiveness of BiasPainter
In this RQ, we investigate whether BiasPainter can effectively trigger and measure the social bias in image generation models.
Image Bias. We input the (seed image, prompt) pairs and let image generation software products and models edit the seed image under different prompts. Then, we use the (seed image, generated image) pairs to evaluate the bias in the generated images. In particular, we adopt BiasPainter to calculate the image bias scores and we find a large number of generated images that are highly biased. We show some examples in Figure 1.
Word Bias. We adopt BiasPainter to calculate the word bias score for each prompt based on image bias scores. For each model and each topic, we list the top three prompt words that are highly biased according to gender, age and race, respectively, in Table 3. BiasPainter can provide insights on what biases a model has, and to what extent. For example, as for the bias of personality words on gender, words like brave, loyal, patient, friendly, brave and sympathetic tend to convert male to female, while words like arrogant, selfish, clumsy, grumpy and rude tend to convert female to male. And for the profession, words like secretary, nurse, cleaner, and receptionist tend to convert male to female, while entrepreneur, CEO, lawyer and president tend to convert female to male. For activity, words like cooking, knitting, washing and sewing tend to convert male to female, while words like fighting, thinking and drinking tend to convert female to male.
In addition, BiasPainter can visualize the distribution of the word bias score for all the prompt words. For example, we use BiasPainter to visualize the distribution of word bias scores on the profession in stable diffusion 1.5. As is shown in Figure 5, the model is more biased to younger rather than older, and more biased to lighter skin tone rather than darker skin tone.
Model Bias. BiasPainter can also calculate the model bias scores to evaluate the fairness of each image generation model. Table 4 shows the results, where we can find that different models are biased at different levels and on different topics. For example, stablediffusion 2.1 is the most biased model on age and Pix2pix shows less bias on age and gender.
This paper is available on arxiv under CC0 1.0 DEED license.