DORAEMON: A Unified Library for Visual Object Modeling and Representation Learning at Scale
Ke Du, Yimin Peng, Chao Gao, Fan Zhou, Siqiao Xue

TL;DR
DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning, providing a scalable, reproducible, and versatile platform for research and deployment across various visual recognition tasks.
Contribution
It introduces a comprehensive, YAML-driven framework that consolidates datasets, models, and training techniques, enabling rapid experimentation and deployment in visual recognition.
Findings
Achieved state-of-the-art results on ImageNet-1K, MS-Celeb-1M, and Stanford online products.
Supports over 1000 pretrained backbones with modular components.
Facilitates easy export to ONNX and HuggingFace for deployment.
Abstract
DORAEMON is an open-source PyTorch library that unifies visual object modeling and representation learning across diverse scales. A single YAML-driven workflow covers classification, retrieval and metric learning; more than 1000 pretrained backbones are exposed through a timm-compatible interface, together with modular losses, augmentations and distributed-training utilities. Reproducible recipes match or exceed reference results on ImageNet-1K, MS-Celeb-1M and Stanford online products, while one-command export to ONNX or HuggingFace bridges research and deployment. By consolidating datasets, models, and training techniques into one platform, DORAEMON offers a scalable foundation for rapid experimentation in visual recognition and representation learning, enabling efficient transfer of research advances to real-world applications. The repository is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Face Recognition and Perception · Domain Adaptation and Few-Shot Learning
