From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models
ACLMay 15, 2023Best Paper
Language models (LMs) are pretrained on diverse data sources, including news,
discussion forums, books, and online encyclopedias. A significant portion of
this data includes opinions and perspectives which, on one hand, celebrate
democracy and diversity of ideas, and on the other hand are inherently socially
biased. Our work develops new methods to (1) measure political biases in LMs
trained on such corpora, along social and economic axes, and (2) measure the
fairness of downstream NLP models trained on top of politically biased LMs. We
focus on hate speech and misinformation detection, aiming to empirically
quantify the effects of political (social, economic) biases in pretraining data
on the fairness of high-stakes social-oriented tasks. Our findings reveal that
pretrained LMs do have political leanings that reinforce the polarization
present in pretraining corpora, propagating social biases into hate speech
predictions and misinformation detectors. We discuss the implications of our
findings for NLP research and propose future directions to mitigate unfairness.