Calibrated Dataset Condensation for Faster Hyperparameter Search
Mucong Ding, Yuancheng Xu, Tahseen Rabbani, Xiaoyu Liu, Brian, Gravelle, Teresa Ranadive, Tai-Ching Tuan, Furong Huang

TL;DR
This paper introduces a hyperparameter-calibrated dataset condensation method that creates synthetic validation datasets to preserve model ranking consistency, significantly accelerating hyperparameter and architecture search on images and graphs.
Contribution
The paper proposes a novel HCDC algorithm that matches hyperparameter gradients for better generalization in dataset condensation, improving hyperparameter search efficiency.
Findings
Maintains validation-performance rankings across models and hyperparameters.
Speeds up hyperparameter and architecture search.
Effective on both image and graph datasets.
Abstract
Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by…
Peer Reviews
Decision·Submitted to ICLR 2024
- The paper introduces a novel approach to condensing datasets for architecture and hyperparameter search. - This paper effectively tackles the important challenges within the field of dataset condensation research. - The primary motivation behind the main objective is compelling. - The proposed method shows strong performance in architecture and hyperparameter search across various datasets.
- Some technical sections of the paper are hard to understand. - On page 5, Definition 2: Is "cos" representing cosine-similarity? If so, why should the term between two hypergradients be zero? - In section 5.2 on page 6, how can discrete factors like model depth or kernel size be expanded continuously? - The process of condensing synthetic validation appears intricate and time-intensive. However, the paper lacks a comprehensive analysis of this procedure. - Could you provide information
1. The hyperparameter gradient matching proposed in the paper is relatively novel and provides new ideas for improving the generalization performance of dataset distillation methods; 2. The paper provides a solid theoretical analysis and a clear explanation of the research issues; 3. The proposed method achieves good results on image datasets and graph datasets.
1. The paper does not explain some concepts mentioned for the first time, such as "supernet"; 2. Lack of some ablation experiments, such as the performance of searching directly by compressing the test set using the dataset distillation method; 3. As a dataset distillation method, there is a lack of comparison with other dataset distillation methods in terms of compression rate and generalization performance.
1. The proposed problem makes sense and is important - this is an initial work focusing on hyperparameter search space in data condensation. 2. The solution shows high-performance improvement in terms of the rank correlation on several domains (image and graph) 3. The paper has a good balance between theoretical analysis and empirical understanding.
1. For the image domain, the used datasets are too simple, which has 32x32 pixels and a smaller number of classes. Usually, the hyperparameter search benefit is much higher in large resolution and large class datasets. 2. The current rank correlation metric is reasonable, but it would be extended to a more fine-granular level. For example, we can calculate the rank correlation for each class and aggregate them on average. This will show how the rank correlation matches with that obtained with
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Time Series Analysis and Forecasting
