FocusDD: Real-World Scene Infusion for Robust Dataset Distillation
Youbing Hu, Yun Cheng, Olga Saukh, Firat Ozdemir, Anqi Lu, Zhiqiang, Cao, Zhijun Li

TL;DR
FocusDD introduces a resolution-independent dataset distillation method that uses key image patches extracted by a pre-trained ViT to create diverse, realistic, and generalizable distilled datasets suitable for both classification and dense tasks like object detection.
Contribution
The paper presents a novel dataset distillation approach that leverages ViT for patch extraction, enabling high-resolution, diverse, and task-generalized distilled datasets, including for object detection.
Findings
Achieves 71.0% and 62.6% accuracy on ImageNet-1K with 100 images per class.
First to use distilled datasets for object detection tasks.
Outperforms state-of-the-art methods on classification benchmarks.
Abstract
Dataset distillation has emerged as a strategy to compress real-world datasets for efficient training. However, it struggles with large-scale and high-resolution datasets, limiting its practicality. This paper introduces a novel resolution-independent dataset distillation method Focus ed Dataset Distillation (FocusDD), which achieves diversity and realism in distilled data by identifying key information patches, thereby ensuring the generalization capability of the distilled dataset across different network architectures. Specifically, FocusDD leverages a pre-trained Vision Transformer (ViT) to extract key image patches, which are then synthesized into a single distilled image. These distilled images, which capture multiple targets, are suitable not only for classification tasks but also for dense tasks such as object detection. To further improve the generalization of the distilled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Vision Transformer · Multi-Head Attention
