Fine-Tuning LLMs with Fine-Grained Human Feedback on Text Spans
Sky CH-Wang, Justin Svegliato, Helen Appel, Jason Eisner

TL;DR
This paper introduces a novel fine-tuning method for language models that uses detailed human feedback on specific text spans, leading to more precise and effective model improvements.
Contribution
It proposes a feedback-driven improvement chain approach with fine-grained span annotations, enhancing preference learning over traditional methods.
Findings
Outperforms standard A/B preference ranking methods
Enables more efficient preference learning
Demonstrates improved model responses through structured feedback
Abstract
We present a method and dataset for fine-tuning language models with preference supervision using feedback-driven improvement chains. Given a model response, an annotator provides fine-grained feedback by marking ``liked'' and ``disliked'' spans and specifying what they liked or disliked about them. The base model then rewrites the disliked spans accordingly, proceeding from left to right, forming a sequence of incremental improvements. We construct preference pairs for direct alignment from each adjacent step in the chain, enabling the model to learn from localized, targeted edits. We find that our approach outperforms direct alignment methods based on standard A/B preference ranking or full contrastive rewrites, demonstrating that structured, revision-based supervision leads to more efficient and effective preference tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
