MILDNet: A Lightweight Single Scaled Deep Ranking Architecture

Anirudha Vishvakarma

arXiv:1903.00905·cs.CV·March 15, 2019·5 cites

MILDNet: A Lightweight Single Scaled Deep Ranking Architecture

Anirudha Vishvakarma

PDF

Open Access 2 Repos

TL;DR

MILDNet is a compact, efficient deep ranking CNN architecture that maintains high performance with significantly fewer parameters and faster inference, suitable for various domains including ecommerce.

Contribution

The paper introduces MILDNet, a novel single-scale deep ranking model that compresses multi-scale CNNs by integrating intermediate layer activations, reducing size and computation while maintaining accuracy.

Findings

01

MILDNet achieves comparable accuracy to state-of-the-art models with one-third the parameters.

02

Intermediate layer activations significantly improve image retrieval performance.

03

The mobile variant of MILDNet is 12 times smaller, suitable for edge devices.

Abstract

Multi-scale deep CNN architecture [1, 2, 3] successfully captures both fine and coarse level image descriptors for visual similarity task, but they come up with expensive memory overhead and latency. In this paper, we propose a competing novel CNN architecture, called MILDNet, which merits by being vastly compact (about 3 times). Inspired by the fact that successive CNN layers represent the image with increasing levels of abstraction, we compressed our deep ranking model to a single CNN by coupling activations from multiple intermediate layers along with the last layer. Trained on the famous Street2shop dataset [4], we demonstrate that our approach performs as good as the current state-of-the-art models with only one third of the parameters, model size, training time and significant reduction in inference time. The significance of intermediate layers on image retrieval task has also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications