MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

Harsh Purohit; Tomoya Nishida; Kota Dohi; Takashi Endo; and Yohei Kawaguchi

arXiv:2507.20666·eess.AS·July 29, 2025

MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

Harsh Purohit, Tomoya Nishida, Kota Dohi, Takashi Endo, and Yohei Kawaguchi

PDF

TL;DR

This paper introduces a novel LLM-based method to generate plausible anomalous sounds from normal machine sounds for evaluating unsupervised sound detection systems without needing real anomaly data.

Contribution

It presents a new synthesis approach using LLMs to interpret fault descriptions and select audio transformations, enabling scalable anomaly generation across diverse machine types.

Findings

01

Synthetic anomalies show consistent detection difficulty trends with real anomalies.

02

The LLM-based synthesis method effectively evaluates UASD systems.

03

The approach reduces reliance on real anomalous sound data.

Abstract

This paper proposes a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection (UASD) systems across different machine types, even in the absence of real anomaly sound data. Conventional keyword-based data augmentation methods often produce unrealistic sounds due to their reliance on manually defined labels, limiting scalability as machine types and anomaly patterns diversify. Advanced audio generative models, such as MIMII-Gen, show promise but typically depend on anomalous training data, making them less effective when diverse anomalous examples are unavailable. To address these limitations, we propose a novel synthesis approach leveraging large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions, converting normal machine sounds into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.