Novel Benchmark for NER in the Wastewater and Stormwater Domain
Franco Alberto Cardillo, Franca Debole, Francesca Frontini, Mitra Aelami, Nan\'ee Chahinian, Serge Conrad

TL;DR
This paper introduces a new multilingual benchmark dataset for domain-specific Named Entity Recognition in wastewater management, evaluating current NER methods including large language models to support environmental decision-making.
Contribution
It creates a French-Italian domain-specific NER corpus and assesses state-of-the-art methods, including automated annotation projection for multilingual extension.
Findings
LLM-based NER approaches perform competitively on the benchmark.
The corpus provides a reliable baseline for future research.
Automated annotation projection shows promise for multilingual extension.
Abstract
Effective wastewater and stormwater management is essential for urban sustainability and environmental protection. Extracting structured knowledge from reports and regulations is challenging due to domainspecific terminology and multilingual contexts. This work focuses on domain-specific Named Entity Recognition (NER) as a first step towards effective relation and information extraction to support decision making. A multilingual benchmark is crucial for evaluating these methods. This study develops a French-Italian domain-specific text corpus for wastewater management. It evaluates state-of-the-art NER methods, including LLM-based approaches, to provide a reliable baseline for future strategies and explores automated annotation projection in view of an extension of the corpus to new languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUnderwater Acoustics Research · Fire Detection and Safety Systems · Geophysical Methods and Applications
