FocusDD: Real-World Scene Infusion for Robust Dataset Distillation

Youbing Hu; Yun Cheng; Olga Saukh; Firat Ozdemir; Anqi Lu; Zhiqiang; Cao; Zhijun Li

arXiv:2501.06405·cs.CV·January 14, 2025

FocusDD: Real-World Scene Infusion for Robust Dataset Distillation

Youbing Hu, Yun Cheng, Olga Saukh, Firat Ozdemir, Anqi Lu, Zhiqiang, Cao, Zhijun Li

PDF

TL;DR

FocusDD introduces a resolution-independent dataset distillation method that uses key image patches extracted by a pre-trained ViT to create diverse, realistic, and generalizable distilled datasets suitable for both classification and dense tasks like object detection.

Contribution

The paper presents a novel dataset distillation approach that leverages ViT for patch extraction, enabling high-resolution, diverse, and task-generalized distilled datasets, including for object detection.

Findings

01

Achieves 71.0% and 62.6% accuracy on ImageNet-1K with 100 images per class.

02

First to use distilled datasets for object detection tasks.

03

Outperforms state-of-the-art methods on classification benchmarks.

Abstract

Dataset distillation has emerged as a strategy to compress real-world datasets for efficient training. However, it struggles with large-scale and high-resolution datasets, limiting its practicality. This paper introduces a novel resolution-independent dataset distillation method Focus ed Dataset Distillation (FocusDD), which achieves diversity and realism in distilled data by identifying key information patches, thereby ensuring the generalization capability of the distilled dataset across different network architectures. Specifically, FocusDD leverages a pre-trained Vision Transformer (ViT) to extract key image patches, which are then synthesized into a single distilled image. These distilled images, which capture multiple targets, are suitable not only for classification tasks but also for dense tasks such as object detection. To further improve the generalization of the distilled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Vision Transformer · Multi-Head Attention