Compact Hypercube Embeddings for Fast Text-based Wildlife Observation Retrieval
Ilyass Moummad, Marius Miron, David Robinson, Kawtar Zaher, Herv\'e Go\"eau, Olivier Pietquin, Pierre Bonnet, Emmanuel Chemla, Matthieu Geist, Alexis Joly

TL;DR
This paper introduces compact hypercube embeddings for efficient text-based wildlife observation retrieval, enabling scalable search over large multimodal databases with reduced computational costs.
Contribution
It extends cross-view hashing to align natural language with visual and audio data in a shared Hamming space using pretrained models and parameter-efficient fine-tuning.
Findings
Hypercube embeddings achieve competitive retrieval performance.
Hashing improves encoder representations and zero-shot generalization.
Method reduces memory and search costs significantly.
Abstract
Large-scale biodiversity monitoring platforms increasingly rely on multimodal wildlife observations. While recent foundation models enable rich semantic representations across vision, audio, and language, retrieving relevant observations from massive archives remains challenging due to the computational cost of high-dimensional similarity search. In this work, we introduce compact hypercube embeddings for fast text-based wildlife observation retrieval, a framework that enables efficient text-based search over large-scale wildlife image and audio databases using compact binary representations. Building on the cross-view code alignment hashing framework, we extend lightweight hashing beyond a single-modality setup to align natural language descriptions with visual or acoustic observations in a shared Hamming space. Our approach leverages pretrained wildlife foundation models, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
