Visual Recognition Using Directional Distribution Distance
Jianxin Wu, Bin-Bin Gao, and Guoqing Liu

TL;DR
This paper introduces D3, a discriminative method for comparing sets of feature vectors in images and videos, demonstrating superior accuracy and speed over existing generative approaches.
Contribution
It proposes a novel discriminative distribution distance (D3) and a directional total variation distance (DTVD) for more effective set comparison in visual recognition.
Findings
D3 outperforms traditional methods in accuracy and speed.
Combining D3 with FV yields synergistic improvements.
D3 is effective in action and image recognition tasks.
Abstract
In computer vision, an entity such as an image or video is often represented as a set of instance vectors, which can be SIFT, motion, or deep learning feature vectors extracted from different parts of that entity. Thus, it is essential to design efficient and effective methods to compare two sets of instance vectors. Existing methods such as FV, VLAD or Super Vectors have achieved excellent results. However, this paper shows that these methods are designed based on a generative perspective, and a discriminative method can be more effective in categorizing images or videos. The proposed D3 (discriminative distribution distance) method effectively compares two sets as two distributions, and proposes a directional total variation distance (DTVD) to measure how separated are they. Furthermore, a robust classifier-based method is proposed to estimate DTVD robustly. The D3 method is evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Human Pose and Action Recognition · Image Retrieval and Classification Techniques
