Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution
Patrik Stano, Ale\v{s} Hor\'ak

TL;DR
This paper compares prompt engineering with large language models and fine-tuning methods for Czech anaphora resolution, showing fine-tuned models outperform prompts in accuracy and resource efficiency.
Contribution
It provides a comprehensive evaluation of prompt-based versus fine-tuned approaches for Czech anaphora resolution, highlighting their respective strengths and trade-offs.
Findings
Fine-tuned models achieve up to 88% accuracy.
Prompt-based methods reach up to 74.5% accuracy.
Fine-tuned models require fewer resources.
Abstract
Anaphora resolution plays a critical role in natural language understanding, especially in morphologically rich languages like Czech. This paper presents a comparative evaluation of two modern approaches to anaphora resolution on Czech text: prompt engineering with large language models (LLMs) and fine-tuning compact generative models. Using a dataset derived from the Prague Dependency Treebank, we evaluate several instruction-tuned LLMs, including Mistral Large 2 and Llama 3, using a series of prompt templates. We compare them against fine-tuned variants of the mT5 and Mistral models that we trained specifically for Czech anaphora resolution. Our experiments demonstrate that while prompting yields promising few-shot results (up to 74.5% accuracy), the fine-tuned models, particularly mT5-large, outperform them significantly, achieving up to 88% accuracy while requiring fewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
