MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems

Arda Y\"uksel; Gabriel Thiem; Susanne Walter; Patrick Felka; Gabriela Alves Werb; Ivan Habernal

arXiv:2604.07956·cs.AI·April 13, 2026

MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems

Arda Y\"uksel, Gabriel Thiem, Susanne Walter, Patrick Felka, Gabriela Alves Werb, Ivan Habernal

PDF

TL;DR

MONETA introduces a multimodal industry classification benchmark combining text and geospatial data, achieving promising results without training and enhancing accuracy through advanced multimodal techniques.

Contribution

It presents the first multimodal benchmark for industry classification using diverse data sources and demonstrates effective baseline performance and improvements.

Findings

01

Baseline accuracy of 62.10% with open-source MLLMs.

02

Baseline accuracy of 74.10% with closed-source MLLMs.

03

Up to 22.80% accuracy increase with advanced techniques.

Abstract

Industry classification schemes are integral parts of public and corporate databases as they classify businesses based on economic activity. Due to the size of the company registers, manual annotation is costly, and fine-tuning models with every update in industry classification schemes requires significant data collection. We replicate the manual expert verification by using existing or easily retrievable multimodal resources for industry classification. We present MONETA, the first multimodal industry classification benchmark with text (Website, Wikipedia, Wikidata) and geospatial sources (OpenStreetMap and satellite imagery). Our dataset enlists 1,000 businesses in Europe with 20 economic activity labels according to EU guidelines (NACE). Our training-free baseline reaches 62.10% and 74.10% with open and closed-source Multimodal Large Language Models (MLLM). We observe an increase of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.