
Media Bias 101: A Very Short Introduction to the Taxonomy
3. February 2025
We reviewed, summarized, and organized the AI and other computational methods from 2019 to 2022 for detecting media bias into six categories.
7 Min Read || View Original Paper || Our Github
Main Takeaways:
- We developed six categories to systematically organize computational methods for detecting various subtypes of media bias, providing a comprehensive framework for evaluating the field.
- Our review highlights that transformer-based models lead the way with significant improvements in accuracy, while graph-based methods fill the gaps by analyzing social network structures, and traditional NLP approaches remain valuable for their transparency and use as baselines.
- We emphasize the importance of interdisciplinary collaboration, as media bias datasets often lack insights from social science research, leading to low annotator agreement and less accurate annotations.
Hello and Welcome (Again)!
Welcome to Media Bias 102! If you’re joining us from Media Bias 101, you already have a solid understanding of media bias and how it shapes our perceptions of the world around us. In this post, we’ll explore the cutting-edge computational methods designed to detect and combat media bias and the datasets that power these models.
Detecting bias is not just an academic exercise; it is the first crucial step in developing systems that can mitigate bias in media content and even evaluate the potential bias of computer-generated texts, such as responses generated by AI models like ChatGPT, Llama2, Gemini, or DeepSeek.
In this post, we’ll walk you through six key categories of contributions from the field of computer science that tackle the challenge of detecting media bias:
- Count-Based Approaches
- Word Embedding-based Techniques
- Transformer-based machine learning (tbML) models
- Non-transformer-based machine learning (ntbML) models
- Non-neural network-based approaches
- Graph-based approaches
Along the way, we also provide a brief overview of available media bias datasets. Whether you’re a researcher or an AI engineer looking for resources to start building your media bias detection applications, this post will offer you a deeper look into the field of AI-driven media bias detection.
Our Methodology
We reviewed 3140 research papers published between 2019 and 2022, narrowing the focus to 96 through systematic selection criteria. These papers were then categorized based on the computational techniques used for media bias detection.
This categorization clarifies the most common approaches in the field. To support mutual understanding across various research domains, we introduced the Media Bias Taxonomy in Media Bias 101, which offers a coherent framework of media bias and its various subtypes. If any of these terms seem unclear, feel free to revisit Media Bias 101, where we define these foundational concepts.
Our review highlights key trends in computational media bias detection, organizing the research into three main groups—Traditional Natural Language Processing, Machine Learning (ML) Techniques, and Others—along with six subcategories for better clarity.
- Traditional Natural Language Processing (tNLP):
This category includes techniques for detecting media bias that do not rely on machine learning (ML) or graph-based methods. These methods, which have been part of computational linguistics since the 1960s and 1970s, are often used as baselines when new datasets are introduced due to their explainability and effectiveness. They are also gaining popularity in the social sciences because of their accessibility and simplicity.
- Traditional Natural Language Processing (tNLP):
Why Do We Need a Baseline in AI Research?
A baseline provides a point of comparison to evaluate new methods. Without it, we can’t tell if a new approach is truly better or just different. For example, when testing an AI model for media bias, comparing its performance to a simpler method like word counting helps determine if the new model is an actual improvement.
- Machine Learning (ML) Techniques:
In this category, we explore ML techniques used to detect bias, organized into three subcategories based on the type of model: transformer-based models (tbML), non-transformer-based models (ntbML), and non-neural network models (nNN). Among these, transformer-based models are the most frequently used and are leading the field in terms of effectiveness and applicability.
- Machine Learning (ML) Techniques:
The figure below illustrates how the six categories are grouped into three main groups. Since this post is based on a review of nearly 100 research papers, we encourage you to explore the original paper for more detailed findings, where we discuss individual studies and their results. Here, we’ve focused on the key background to help you understand the categorization and its pros and cons in media bias detection.

Traditional Natural Language Processing (tNLP)
- Count-Based Approaches:
Count-based methods work by counting word occurrences, assuming that more frequent words are more important for understanding bias. For example, the frequent use of “freedom” might suggest a focus on promoting freedom, hinting at political bias. However, this approach has limitations, as it doesn’t consider word context or relationships between words, such as negation. Despite these drawbacks, count-based methods are simple, easy to implement, and often serve as a baseline for comparing more complex methods. - Word Embedding-based Techniques:
Word embedding-based techniques, on the other hand, detect media bias by looking at word associations. Instead of focusing on biases in pre-trained word embeddings, this method investigates how word relationships in texts might show bias. Researchers create word embeddings from a collection of texts and examine the patterns within them. There are two main types of embeddings:
- Sparse embeddings (like TF-IDF) focus on counting the frequency of specific words in a document.
- Dense embeddings look at the deeper connections between words, helping to spot biases in how words relate to each other.
Machine Learning (ML) Techniques
Transformer-Based Machine Learning (tbML):
Transformer-based models (tbML) gained popularity after 2017 due to their ability to efficiently detect media bias. These models analyze all words in a sentence simultaneously, focusing on their relationships to determine key terms and grasp context. While primarily used for detecting linguistic bias, tbML is also effective at identifying political stances, framing bias, racial/group bias, sentiment, and unreliable news, significantly improving the accuracy and detection of more nuanced biases.Non-Transformer-Based Machine Learning (ntbML):
These methods are commonly used to detect media bias, especially at the document level, such as identifying hyperpartisanship or political stances. These techniques also serve as reliable baselines for evaluating the quality of labels in new datasets, providing a solid foundation for initial bias detection. While they often focus on similar biases, ntbML methods examine different aspects of detection, such as training data, word embeddings, and pseudo-labeling.Non-Neural Network (nNN)-Based Approaches:
While advanced methods like tbML and deep learning are popular for bias detection, non-neural network (nNN) techniques like LDA, SVM, and regression models are still widely used. These methods are often chosen in studies that introduce new datasets because they provide a solid and well-known baseline for evaluating the quality of the dataset’s labels.
Others
Graph-Based Approaches:
Graph-based methods analyze social interactions by representing data as a network of connected nodes (such as users, posts, or articles) and edges (like likes, shares, or comments). This approach focuses on the relationships between these entities, helping to uncover how media bias spreads within communities.
It’s particularly useful for studying echo chambers, where users are exposed to similar viewpoints, or how framing bias and political stances are influenced by social networks. Unlike other methods that focus on individual texts or behaviors, graph-based approaches highlight the connections that reveal how biases propagate and impact public opinion.
Available Media Bias Datasets
In our review, we identified 123 datasets categorized according to the concepts in the Media Bias Taxonomy. These datasets cover various types of bias, including linguistic bias, framing bias, text-level context bias, reporting-level context bias, cognitive bias, as well as related topics like hate speech and sentiment analysis. We also introduced a new category, General Linguistic Bias, for datasets that don’t specify a particular bias subtype.
For a detailed overview of each dataset, including its size, availability, tasks, labels, and publication summary, we encourage you to refer to our GitHub repository and the original paper.
Moving Forward in Media Bias Detection
Our review highlights several key insights into the current state of media bias detection:
- Transformer-Based Models Lead the Way:
Transformer-based classification approaches are currently state-of-the-art, offering significant improvements in classification accuracy and the ability to detect more granular types of bias. - Graph-Based Methods Fill the Gaps:
While less popular than transformers, graph-based methods provide unique advantages by analyzing social network content, activities, and structures. They can uncover insights that transformers cannot, such as detecting structural political stances or information on spreading behavior. - Traditional Approaches Remain Valuable:
Traditional natural language processing (tNLP) are simpler, more explainable, and often used as baselines to compare newer, transformer-based approaches. They are especially useful when transparency in classification decisions is critical. - Interdisciplinary Insights Are Lacking:
Existing media bias datasets often ignore insights from social science research, leading to low annotator agreement and less accurate annotations. This lack of interdisciplinarity highlights the need for greater awareness of the various types of bias to improve dataset quality and evaluation methods. - The Need for Further Research:
While our taxonomy provides a solid foundation, future research should critically re-examine the discussed concepts to refine media bias detection systems and their performance evaluation.
For more detailed information on these findings and the datasets used in our review, please refer to our GitHub repository and the original paper.