This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jeremiah Milbauer, Carnegie Mellon University, Pittsburgh PA, USA (email: {jmilbaue | sherryw}@cs.cmu.edu);
(2) Ziqi Ding, Carnegie Mellon University, Pittsburgh PA, USA (e-mail: {ziqiding | zhijinw}@andrew.cmu.edu)
(3) Tongshuang Wu, Carnegie Mellon University, Pittsburgh PA, USA.
Table of Links
- Abstract and Introduction
- Related Work
- The NEWSSENSE Framework
- Pilot Study
- System Overview
- Discussion
- Conclusion, Limitations and Ethics, and References
- A. Appendix: Prompt for Claim Extraction
2 Related Work
This section covers related research across media analytics, sensemaking, and natural language processing. Though some core ideas of this work have been explored in the past, to our knowledge they have never been combined in a single system.
Media Bias and Analytics Research on media bias includes academic research to study social media sharing patterns (Roberts et al., 2021; Bakshy et al., 2015) and bias within media publications (Flaxman et al., 2016; Hamborg et al., 2019; Groseclose and Milyo, 2005). Commercial products exist in this area as well, such as the media bias charts of AllSides [5], which classifies political slant into one of five categories, and Ad Fontes Media [6], which models both political slant and factual credibility.
Research on news and social content aggregation has focused primarily on headline detection, timeline construction and clustering (Bouras and Tsogkas, 2012; Laban and Hearst, 2017), and event detection (Atefeh and Khreich, 2015; Kumaran and Allan, 2004). There exist user-oriented products in this space, such as Google News Stories [7], and Ground.news [8]. Some outlets, such as Propublica, aggregate their news stories into timelines [9].
Reading Interfaces and Sensemaking Recent work on reading interfaces has primarily focused on scientific literature, augmenting documents with information about cited papers (Lo et al., 2023; Kang et al., 2022), or augmenting references within the documents themselves (Head et al., 2021).
For the News domain specifically, Laban and Hearst (2017) aggregates articles and extracts key quotes to construct a timeline for a given story. We are also aware of an abstract describing work to combine multiple article headlines and ledes into a single digestible form, though no follow-up is available (Glassman et al., 2020).
Fact Verification and NLI Natural Language Inference is a task focused on classifying the relationship between a pair of sentences as either “neutral", “entailment", or “contradiction." Datasets such as SNLI (Bowman et al., 2015) and MNLI (Williams et al., 2017) have become major benchmarks for natural language processing research. Recent work has also considered document-level NLI (Koreeda and Manning, 2021; Chen et al., 2022), as well as cross-document reasoning based in NLI (Schuster et al., 2022), and scalable pairwise reasoning (Milbauer et al., 2023).
There is also a growing body of work on NLP systems for fact verification and attribution. Recent datasets include FEVER (Thorne et al., 2018) and VitaminC (Schuster et al., 2021), as well as datasets focused on real-world examples of updating, editing, and citing claims in domains like news and Wikipedia (Petroni et al., 2022; Spangher et al., 2022; Iv et al., 2022).
[5] https://www.allsides.com/media-bias/media-bias-ratingmethods
[6] https://adfontesmedia.com/interactive-media-bias-chart/
[7] https://news.google.com
[8] https://ground.news/
[9] https://www.propublica.org/series