TL;DR
This paper introduces HMTT, a novel short text hashing method that integrates multi-granularity topics and tags to improve semantic similarity preservation, outperforming existing methods on various datasets.
Contribution
The paper proposes a unified approach combining multi-granularity topic selection and tag exploitation for enhanced short text hashing.
Findings
HMTT significantly outperforms baseline methods on evaluation metrics.
Optimal multi-granularity topic selection depends on dataset type.
Incorporating tags improves semantic similarity in hash codes.
Abstract
Due to computational and storage efficiencies of compact binary codes, hashing has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to utilize latent topics of certain granularity to preserve semantic similarity in hash codes beyond keyword matching. However, topics of certain granularity are not adequate to represent the intrinsic semantic information. In this paper, we present a novel unified approach for short text Hashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, we propose a selection method to choose the optimal multi-granularity topics depending on the type of dataset, and design two distinct hashing strategies to incorporate multi-granularity topics. We also propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
