For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

Wenlong Deng; Qi Zeng; Jiaming Zhang; Minghui Chen; Zixin Ding; Christos Thrampoulidis; Boying Gong; Xiaoxiao Li

arXiv:2508.10180·cs.CL·April 28, 2026

For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

Wenlong Deng, Qi Zeng, Jiaming Zhang, Minghui Chen, Zixin Ding, Christos Thrampoulidis, Boying Gong, Xiaoxiao Li

PDF

1 Repo

TL;DR

For-Value introduces a forward-only, efficient data valuation method for large language and vision-language models, enabling scalable and effective data importance estimation without backpropagation.

Contribution

It presents a novel, simple closed-form data valuation framework that relies solely on forward passes, reducing computational costs significantly.

Findings

01

For-Value matches or outperforms gradient-based methods in influence detection.

02

It achieves substantial efficiency improvements over existing methods.

03

Theoretical analysis links data valuation to representation alignment and prediction errors.

Abstract

Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. In this work, we introduce For-Value, a forward-only data valuation framework that enables efficient batch-scalable value estimation while maintaining effectiveness. Leveraging the expressive power of pretrained LLMs/VLMs, we theoretically demonstrate that data valuation can be captured by the alignment between the final hidden representations and prediction errors at the last layer. In light of this insight, For-Value computes data value using a simple closed-form expression with a single forward pass, eliminating the need for costly backpropagation and enabling efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vengdeng/For-Value-Efficient-Forward-Only-Data-Valuation-for-finetuning
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.