Resources
Most recent models are published on Huggingface
[Benchmark, GitHub] MBIB – the first Media Bias Identification Benchmark Task and Dataset Collection
[Dataset, Huggingface] Anno-lexical (Lexical bias)
[Dataset, GitHub] BABE – Bias Annotations By Experts
[Dataset, Paper] BAT – Bias And Twitter
[Scale/Questionnaire to measure bias perception] Do You Think It’s Biased? How To Ask For The Perception Of Media Bias (A set of tested questions to assess media bias perception to be used in any bias-related research)
[Dataset, Zenodo] MBIC -A Media Bias Annotation Dataset Including Annotator Characteristics
Publications
2025
Spinde, Timo; Lin, Luyang; Hinterreiter, Smi; Echizen, Isao
Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic — A Case Study on Media Bias Proceedings Article Forthcoming
In: Proceedings of the International AAAI Conference on Web and Social Media (ICWSM'25), AAAI, Copenhagen, Denmark, Forthcoming.
Abstract | Links | BibTeX | Tags: definition extraction, LLM, media bias, relevance classification, taxonomy building
@inproceedings{Spinde2025TaxoMatic,
title = {Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic — A Case Study on Media Bias},
author = {Timo Spinde and Luyang Lin and Smi Hinterreiter and Isao Echizen},
url = {https://media-bias-research.org/wp-content/uploads/2025/04/spinde2025.pdf},
year = {2025},
date = {2025-06-01},
urldate = {2025-06-01},
booktitle = {Proceedings of the International AAAI Conference on Web and Social Media (ICWSM'25)},
volume = {19},
publisher = {AAAI},
address = {Copenhagen, Denmark},
abstract = {This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the framework’s effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains.},
keywords = {definition extraction, LLM, media bias, relevance classification, taxonomy building},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}