Code Book for Annotation of Diverse Cross-Document Coreference: Bibliographical References

cover
17 May 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jakob Vogel, M.A. Digital Humanities, Institute for Digital Humanities, Faculty of Philosophy, Georg August University of G¨ottingen.

2. Diverse cross-document coreference and media bias analysis

Media bias is a multifaceted phenomenon of news coverage that is one-sided, politically shaded, or in some other way non-neutral. It can occur in all sorts of news media, though we focus on digital print media, only. One specific type of media bias is bias by word-choice and labeling (Hamborg et al., 2019). Word choice describes the selection from a variety of possible expressions to refer to an entity. For example, in order to refer to the USA’s current head of state, journalists could use one of the relatively neutral alternatives “Joe Biden”, “Biden”, or “the US president”, or in theory, choose a clearly biased expression like ”the dictator” (Kurmelovs, 2023).

Labeling, on the other hand, describes the assignment of attributes to an expression, inter alia by adding adjectives. Examples for bias by labeling include “an anxious and uncertain president” or “crooked Joe Biden” (Luciano, 2023). Together, word-choice and labelling form a so-called frame (Hamborg et al., 2019). In news articles, frames are used in a variety of ways, either for the sake of linguistic diversity or to make certain, potentially biased statements about an entity. To test an article for such statements, all of an entity’s frames need to be extracted and evaluated together. Hence, before an article can be properly analyzed with regards to if and how it uses biased frames of (certain) entities, we are first faced with the task of identifying such frames. The identification of all expressions that refer to the same entity is a matter of coreference resolution. To conclude, successful coreference resolution is a prerequisite to any further inquiry of media bias by word-choice and labelling.

As already indicated above, automatic coreference resolution does show good results in extracting identity clusters from a document (Liu et al., 2023). However, we have seen that there exist nearidentity relations between expressions, potentially even across documents, that would be mostly overseen by standard coreference resolution approaches (Zhukova et al., 2022). Hence, they would also be overseen by any media bias analysis that depends on coreference resolution. We hope that our building of a corpus for diverse cross-document coreference will contribute to the analysis of media bias by providing data that contains the full variety of frames used in news articles. Eventually, we would like to test how we can measure media bias by focusing on diverse coreference in news articles. To answer this last question, though, an additional layer of media bias annotation would have to be put upon our coreference data (Spinde et al., 2021a,b).