Set2Box: Similarity Preserving Representation Learning of Sets
Geon Lee, Chanyoung Park, Kijung Shin

TL;DR
Set2Box introduces a learning-based set representation method using boxes that enables fast, accurate similarity estimation with reduced storage, outperforming traditional hashing and sketching techniques.
Contribution
The paper presents Set2Box and Set2Box+ methods for representing sets as boxes, allowing efficient similarity computation and improved accuracy over baseline approaches.
Findings
Set2Box+ achieves up to 40.8X smaller estimation error.
Set2Box+ requires 60% fewer bits for encoding.
The approach enables estimation of four similarity measures from one representation.
Abstract
Sets have been used for modeling various types of objects (e.g., a document as the set of keywords in it and a customer as the set of the items that she has purchased). Measuring similarity (e.g., Jaccard Index) between sets has been a key building block of a wide range of applications, including, plagiarism detection, recommendation, and graph compression. However, as sets have grown in numbers and sizes, the computational cost and storage required for set similarity computation have become substantial, and this has led to the development of hashing and sketching based solutions. In this work, we propose Set2Box, a learning-based approach for compressed representations of sets from which various similarity measures can be estimated accurately in constant time. The key idea is to represent sets as boxes to precisely capture overlaps of sets. Additionally, based on the proposed box…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
