Short biography
Filip is a first year PhD Student supervised by Radu Timofte focused on evaluating and developing methods of mechanistic interpretability to detect, quantify and mitigate biases within Vision-Language Models.
For more info visit his homepage.
Contact
f.kucera <ät> media-bias-research org
References
2026
Kučera, Filip; Mandl, Christoph; Echizen, Isao; Timofte, Radu; Spinde, Timo
SciDef: Automating Definition Extraction from Academic Literature with Large Language Models Miscellaneous
arXiv (preprint), 2026.
Abstract | Links | BibTeX | Tags:
@misc{kucera2026scidefautomatingdefinitionextraction,
title = {SciDef: Automating Definition Extraction from Academic Literature with Large Language Models},
author = {Filip Kučera and Christoph Mandl and Isao Echizen and Radu Timofte and Timo Spinde},
url = {https://arxiv.org/abs/2602.05413},
year = {2026},
date = {2026-01-01},
urldate = {2026-01-01},
abstract = {Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.
Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.},
howpublished = {arXiv (preprint)},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them.
Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.
Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.
