This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
Authors:
(1) Cristina España-Bonet, DFKI GmbH, Saarland Informatics Campus.
Table of Links
- Abstract and Intro
- Corpora Compilation
- Political Stance Classification
- Summary and Conclusions
- Limitations and Ethics Statement
- Acknowledgments and References
- A. Newspapers in OSCAR 22.01
- B. Topics
- C. Distribution of Topics per Newspaper
- D. Subjects for the ChatGPT and Bard Article Generation
- E. Stance Classification at Article Level
- F. Training Details
4. Summary and Conclusions
Media sources have an editorial line and an associated bias. Getting rid of political biases is difficult for humans, but being aware of them helps us getting a global view of news. Biases are sometimes clear and/or appear in form of harmful text, but sometimes are subtle and difficult to detect. These subtle hidden biases are potentially dangerous and lead to manipulation whenever we are not aware of them. In this work, we systematically studied the subtle political biases behind ChatGPT and Bard, those that appear without assigning any persona role (Deshpande et al., 2023). We showed that ChatGPT’s orientation changes with time and it is different across languages. Between Feb and Aug 2023, ChatGPT transitioned from a Left to Neutral political orientation, with a Right-leaning period in the middle for English and Spanish. The evolution for Bard cannot be studied yet. Its current version as of Aug 2023 consistently shows Left-leaning for the 4 languages under study. This bias is independent on the factual mistakes that the model generates, and should also be considered by its users. We provide models to regularly check the bias in text generations for USA, Germany and Spain, as well as in closely related political contexts and languages using a zero-shot approach.
As a by-product of our analysis, we created a multilingual corpus of 1.2M newspaper articles with coarse annotations of political stance and topic. We show that distant supervision allows us to build meaningful models for coarse political stance classification as long as the corpus is diverse. We make available this data together with the LMs generations and our code through Zenodo (España-Bonet, 2023) and Github.[12]
[12] https://github.com/cristinae/docTransformer