Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs

Osma Suominen; Juho Inkinen; Mona Lehtinen

arXiv:2508.15877·cs.CL·August 25, 2025

Annif at the GermEval-2025 LLMs4Subjects Task: Traditional XMTC Augmented by Efficient LLMs

Osma Suominen, Juho Inkinen, Mona Lehtinen

PDF

Open Access 1 Models

TL;DR

This paper describes an enhanced Annif system that combines traditional extreme multi-label classification with efficient large language models to improve subject prediction accuracy and efficiency in bibliographic record indexing.

Contribution

The paper introduces a novel hybrid approach that integrates small, efficient LLMs with traditional methods, achieving top performance in GermEval-2025's subject classification task.

Findings

01

Ranked 1st in overall quantitative evaluation

02

Achieved top qualitative evaluation results

03

Demonstrated improved efficiency and accuracy

Abstract

This paper presents the Annif system in the LLMs4Subjects shared task (Subtask 2) at GermEval-2025. The task required creating subject predictions for bibliographic records using large language models, with a special focus on computational efficiency. Our system, based on the Annif automated subject indexing toolkit, refines our previous system from the first LLMs4Subjects shared task, which produced excellent results. We further improved the system by using many small and efficient language models for translation and synthetic data generation and by using LLMs for ranking candidate subjects. Our system ranked 1st in the overall quantitative evaluation of and 1st in the qualitative evaluation of Subtask 2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
NatLibFi/Annif-LLMs4Subjects-GermEval2025-data
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Text Readability and Simplification