Hierarchical Vision-Language Retrieval of Educational Metaverse Content in Agriculture
Ali Abdari, Alex Falcon, Giuseppe Serra

TL;DR
This paper introduces a new agricultural Metaverse dataset and a hierarchical vision-language retrieval model to improve search and organization of educational content in immersive environments, demonstrating significant performance gains.
Contribution
The work presents a novel dataset of agricultural virtual museums and a hierarchical model for vision-language retrieval, advancing the organization of educational Metaverse content.
Findings
Achieved up to 62% R@1 and 78% MRR on the new dataset.
Improved existing benchmarks by up to 6% R@1 and 11% MRR.
Validated effectiveness through extensive evaluation.
Abstract
Every day, a large amount of educational content is uploaded online across different areas, including agriculture and gardening. When these videos or materials are grouped meaningfully, they can make learning easier and more effective. One promising way to organize and enrich such content is through the Metaverse, which allows users to explore educational experiences in an interactive and immersive environment. However, searching for relevant Metaverse scenarios and finding those matching users' interests remains a challenging task. A first step in this direction has been done recently, but existing datasets are small and not sufficient for training advanced models. In this work, we make two main contributions: first, we introduce a new dataset containing 457 agricultural-themed virtual museums (AgriMuseums), each enriched with textual descriptions; and second, we propose a hierarchical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
