Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning
Daniele Rege Cambrin, Giuseppe Gallipoli, Irene Benedetto, Luca, Cagliero, Paolo Garza

TL;DR
This paper explores using semantic segmentation loss functions for fine-tuning large language models, achieving significant performance improvements without extra data or human feedback.
Contribution
It introduces a novel application of segmentation loss functions to natural language tasks, demonstrating their effectiveness in improving LLM fine-tuning.
Findings
Focal and Lovász losses outperform cross-entropy in tasks like Math Word Problems and question answering.
Models trained with alternative losses show a +42% mean improvement in exact match.
The approach offers a scalable, resource-efficient alternative to traditional fine-tuning methods.
Abstract
Large Language Models (LLMs) have demonstrated impressive performance across various tasks. However, current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. These solutions are often not scalable or feasible due to their associated costs, complexity, or resource requirements. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution for fine-tuning different architectures. We evaluate their effectiveness in solving Math Word Problems and question answering across different models of varying sizes. For the analyzed tasks, we found that the traditional Cross-Entropy loss represents a sub-optimal choice, while models trained to minimize alternative (task-dependent) losses, such as Focal or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsHigh-Order Consensuses
