TL;DR
This paper introduces a semi-supervised Bernoulli autoencoder approach for semantic hashing, improving binary code quality in low-label scenarios by leveraging label predictions, with experiments showing significant gains over existing methods.
Contribution
It proposes a novel semi-supervised training method for Bernoulli autoencoders that enhances hashing performance when labeled data is limited.
Findings
Pairwise loss degrades with fewer labels
Proposed label distribution-based supervision improves performance in scarce label settings
Method achieves comparable results to fully supervised models with less labeled data
Abstract
Semantic hashing is an emerging technique for large-scale similarity search based on representing high-dimensional data using similarity-preserving binary codes used for efficient indexing and search. It has recently been shown that variational autoencoders, with Bernoulli latent representations parametrized by neural nets, can be successfully trained to learn such codes in supervised and unsupervised scenarios, improving on more traditional methods thanks to their ability to handle the binary constraints architecturally. However, the scenario where labels are scarce has not been studied yet. This paper investigates the robustness of hashing methods based on variational autoencoders to the lack of supervision, focusing on two semi-supervised approaches currently in use. The first augments the variational autoencoder's training objective to jointly model the distribution over the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
