How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes

Mahnoor Fatima Saad; Ziad Al-Halah

arXiv:2508.02905·cs.CV·August 6, 2025

How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes

Mahnoor Fatima Saad, Ziad Al-Halah

PDF

TL;DR

This paper presents a novel approach for generating room acoustic profiles conditioned on material configurations, enabling realistic sound simulation for indoor scenes with diverse materials.

Contribution

It introduces a new task of material-controlled acoustic profile generation and proposes an encoder-decoder model for this purpose, along with a new benchmark dataset.

Findings

01

The model effectively encodes material information and generates high-fidelity RIRs.

02

It outperforms baseline and state-of-the-art methods in RIR generation.

03

The approach enables diverse acoustic profile generation based on user-defined materials.

Abstract

How would the sound in a studio change with a carpeted floor and acoustic tiles on the walls? We introduce the task of material-controlled acoustic profile generation, where, given an indoor scene with specific audio-visual characteristics, the goal is to generate a target acoustic profile based on a user-defined material configuration at inference time. We address this task with a novel encoder-decoder approach that encodes the scene's key properties from an audio-visual observation and generates the target Room Impulse Response (RIR) conditioned on the material specifications provided by the user. Our model enables the generation of diverse RIRs based on various material configurations defined dynamically at inference time. To support this task, we create a new benchmark, the Acoustic Wonderland Dataset, designed for developing and evaluating material-aware RIR prediction methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.