TrackList: Tracing Back Query Linguistic Diversity for Head and Tail Knowledge in Open Large Language Models
Ioana Buhnila, Aman Sinha, Mathieu Constant

TL;DR
This paper introduces TrackList, a detailed analysis pipeline and a new dataset RefoMed-EN to evaluate how large language models perform on diverse linguistic queries, revealing biases towards frequent knowledge.
Contribution
The study presents a novel analysis pipeline and dataset to investigate LLM performance on varied linguistic queries, highlighting frequency-related biases in model responses.
Findings
LLMs perform best on definition questions.
Performance drops on exemplification and technical knowledge.
Models favor frequent over rare or tail knowledge.
Abstract
Large Language Models (LLMs) have proven efficient in giving definition-type answers to user input queries. While for humans giving various types of answers, such as examples and paraphrases, is an easy task, LLMs struggle to provide correct answers for other than definition-type queries. In this study, we evaluated this drop in performance using TrackList, a fine-grained linguistic and statistical analysis pipeline to investigate the impact of the pre-training data on LLMs answers to diverse linguistic queries. We also introduce RefoMed-EN, an English dataset consisting of 6170 human-annotated medical terms alongside their corresponding definitions, denominations, exemplifications, explanations, or paraphrases. We studied whether the high frequency of a concept (head) or low frequency (tail) impacts the language model's performance. We evaluated the quality of the LLM's output using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
