Dictionary-Learning-Based Data Pruning for System Identification
Tingna Wang (1, 2), Sikai Zhang (4), Mingming Song (1, 3), Limin Sun (1, 2, 3) ((1) College of Civil Engineering, Tongji University, Shanghai, China, (2) Shanghai Qi Zhi Institute, Shanghai, China, (3) State Key Laboratory of Disaster Reduction in Civil Engineering

TL;DR
This paper introduces mini-batch FastCan, a dictionary-learning-based data pruning method that effectively reduces sample redundancy in system identification, leading to improved model performance on benchmark datasets.
Contribution
The paper presents a novel sample-wise data pruning technique using dictionary learning, addressing redundancy reduction in system identification.
Findings
Significantly outperforms random pruning.
Effective in reducing sample redundancy.
Improves model coefficient similarity.
Abstract
System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing redundancy feature-wise, while less attention is paid to sample-wise redundancy. This paper proposes a novel data pruning method, called mini-batch FastCan, to reduce sample-wise redundancy based on dictionary learning. Time series data is represented by some representative samples, called atoms, via dictionary learning. The useful samples are selected based on their correlation with the atoms. The method is tested on one simulated dataset and two benchmark datasets. The R-squared between the coefficients of models trained on the full datasets and the coefficients of models trained on pruned datasets is adopted to evaluate the performance of data pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Algorithms and Data Compression · Advanced Computational Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Focus · Pruning
