Adaptive Data Selection for Multi-Layer Perceptron Training: A Sub-linear Value-Driven Method

Xiyang Zhang; Chen Liang; Haoxuan Qiu; Hongzhi Wang

arXiv:2510.21286·cs.LG·October 27, 2025

Adaptive Data Selection for Multi-Layer Perceptron Training: A Sub-linear Value-Driven Method

Xiyang Zhang, Chen Liang, Haoxuan Qiu, Hongzhi Wang

PDF

TL;DR

This paper introduces DVC, a novel adaptive data selection method for training multi-layer perceptrons that considers hierarchical data contributions and dynamically evolves during training, outperforming existing methods.

Contribution

The paper presents DVC, a new budget-aware data selection approach that decomposes data value into layer-wise and global contributions, addressing scalability and nonlinear transformation challenges.

Findings

01

DVC outperforms existing methods in accuracy and F1 scores across six datasets.

02

The approach effectively balances exploration and exploitation with UCB.

03

Hierarchical data evaluation improves training efficiency and model performance.

Abstract

Data selection is one of the fundamental problems in neural network training, particularly for multi-layer perceptrons (MLPs) where identifying the most valuable training samples from massive, multi-source, and heterogeneous data sources under budget constraints poses significant challenges. Existing data selection methods, including coreset construction, data Shapley values, and influence functions, suffer from critical limitations: they oversimplify nonlinear transformations, ignore informative intermediate representations in hidden layers, or fail to scale to larger MLPs due to high computational complexity. In response, we propose DVC (Data Value Contribution), a novel budget-aware method for evaluating and selecting data for MLP training that accounts for the dynamic evolution of network parameters during training. The DVC method decomposes data contribution into Layer Value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.