Random Forests Can Hash
Qiang Qiu, Guillermo Sapiro, Alex Bronstein

TL;DR
This paper presents a novel random forest-based semantic hashing method that enforces hash consistency within trees and uses information theory for code aggregation, significantly improving large-scale data retrieval performance.
Contribution
It introduces a subspace model for splitting functions and an information-theoretic code aggregation approach for random forests in hashing applications.
Findings
Outperforms state-of-the-art hashing methods on large datasets
Enforces hash consistency for data from the same class within trees
Produces near-optimal class-specific hash codes
Abstract
Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Machine Learning and Data Classification
