MILDNet: A Lightweight Single Scaled Deep Ranking Architecture
Anirudha Vishvakarma

TL;DR
MILDNet is a compact, efficient deep ranking CNN architecture that maintains high performance with significantly fewer parameters and faster inference, suitable for various domains including ecommerce.
Contribution
The paper introduces MILDNet, a novel single-scale deep ranking model that compresses multi-scale CNNs by integrating intermediate layer activations, reducing size and computation while maintaining accuracy.
Findings
MILDNet achieves comparable accuracy to state-of-the-art models with one-third the parameters.
Intermediate layer activations significantly improve image retrieval performance.
The mobile variant of MILDNet is 12 times smaller, suitable for edge devices.
Abstract
Multi-scale deep CNN architecture [1, 2, 3] successfully captures both fine and coarse level image descriptors for visual similarity task, but they come up with expensive memory overhead and latency. In this paper, we propose a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times). Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, we compressed our deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer. Trained on the famous Street2shop dataset [4], we demonstrate that our approach performs as good as the current state-of-the-art models with only one third of the parameters, model size, training time and significant reduction in inference time. The significance of intermediate layers on image retrieval task has also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
