Hierarchical Vision-Language Retrieval of Educational Metaverse Content in Agriculture

Ali Abdari; Alex Falcon; Giuseppe Serra

arXiv:2508.13713·cs.CV·August 20, 2025

Hierarchical Vision-Language Retrieval of Educational Metaverse Content in Agriculture

Ali Abdari, Alex Falcon, Giuseppe Serra

PDF

TL;DR

This paper introduces a new agricultural Metaverse dataset and a hierarchical vision-language retrieval model to improve search and organization of educational content in immersive environments, demonstrating significant performance gains.

Contribution

The work presents a novel dataset of agricultural virtual museums and a hierarchical model for vision-language retrieval, advancing the organization of educational Metaverse content.

Findings

01

Achieved up to 62% R@1 and 78% MRR on the new dataset.

02

Improved existing benchmarks by up to 6% R@1 and 11% MRR.

03

Validated effectiveness through extensive evaluation.

Abstract

Every day, a large amount of educational content is uploaded online across different areas, including agriculture and gardening. When these videos or materials are grouped meaningfully, they can make learning easier and more effective. One promising way to organize and enrich such content is through the Metaverse, which allows users to explore educational experiences in an interactive and immersive environment. However, searching for relevant Metaverse scenarios and finding those matching users' interests remains a challenging task. A first step in this direction has been done recently, but existing datasets are small and not sufficient for training advanced models. In this work, we make two main contributions: first, we introduce a new dataset containing 457 agricultural-themed virtual museums (AgriMuseums), each enriched with textual descriptions; and second, we propose a hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.