Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution

Patrik Stano; Ale\v{s} Hor\'ak

arXiv:2506.18091·cs.CL·June 24, 2025

Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution

Patrik Stano, Ale\v{s} Hor\'ak

PDF

TL;DR

This paper compares prompt engineering with large language models and fine-tuning methods for Czech anaphora resolution, showing fine-tuned models outperform prompts in accuracy and resource efficiency.

Contribution

It provides a comprehensive evaluation of prompt-based versus fine-tuned approaches for Czech anaphora resolution, highlighting their respective strengths and trade-offs.

Findings

01

Fine-tuned models achieve up to 88% accuracy.

02

Prompt-based methods reach up to 74.5% accuracy.

03

Fine-tuned models require fewer resources.

Abstract

Anaphora resolution plays a critical role in natural language understanding, especially in morphologically rich languages like Czech. This paper presents a comparative evaluation of two modern approaches to anaphora resolution on Czech text: prompt engineering with large language models (LLMs) and fine-tuning compact generative models. Using a dataset derived from the Prague Dependency Treebank, we evaluate several instruction-tuned LLMs, including Mistral Large 2 and Llama 3, using a series of prompt templates. We compare them against fine-tuned variants of the mT5 and Mistral models that we trained specifically for Czech anaphora resolution. Our experiments demonstrate that while prompting yields promising few-shot results (up to 74.5% accuracy), the fine-tuned models, particularly mT5-large, outperform them significantly, achieving up to 88% accuracy while requiring fewer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.