Beyond Accuracy Optimization: Computer Vision Losses for Large Language   Model Fine-Tuning

Daniele Rege Cambrin; Giuseppe Gallipoli; Irene Benedetto; Luca; Cagliero; Paolo Garza

arXiv:2409.13641·cs.CL·December 16, 2024

Beyond Accuracy Optimization: Computer Vision Losses for Large Language Model Fine-Tuning

Daniele Rege Cambrin, Giuseppe Gallipoli, Irene Benedetto, Luca, Cagliero, Paolo Garza

PDF

Open Access 1 Repo

TL;DR

This paper explores using semantic segmentation loss functions for fine-tuning large language models, achieving significant performance improvements without extra data or human feedback.

Contribution

It introduces a novel application of segmentation loss functions to natural language tasks, demonstrating their effectiveness in improving LLM fine-tuning.

Findings

01

Focal and Lovász losses outperform cross-entropy in tasks like Math Word Problems and question answering.

02

Models trained with alternative losses show a +42% mean improvement in exact match.

03

The approach offers a scalable, resource-efficient alternative to traditional fine-tuning methods.

Abstract

Large Language Models (LLMs) have demonstrated impressive performance across various tasks. However, current training approaches combine standard cross-entropy loss with extensive data, human feedback, or ad hoc methods to enhance performance. These solutions are often not scalable or feasible due to their associated costs, complexity, or resource requirements. This study investigates the use of established semantic segmentation loss functions in natural language generation to create a versatile, practical, and scalable solution for fine-tuning different architectures. We evaluate their effectiveness in solving Math Word Problems and question answering across different models of varying sizes. For the analyzed tasks, we found that the traditional Cross-Entropy loss represents a sub-optimal choice, while models trained to minimize alternative (task-dependent) losses, such as Focal or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

darthreca/segmentation-losses-nlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsHigh-Order Consensuses