Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes

Yiming Dou; Wonseok Oh; Yuqing Luo; Antonio Loquercio; Andrew Owens

arXiv:2506.09989·cs.CV·June 12, 2025

Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes

Yiming Dou, Wonseok Oh, Yuqing Luo, Antonio Loquercio, Andrew Owens

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to generate realistic sounds of human hand interactions with 3D scenes by training a flow model on recorded action-sound pairs, enabling interactive sound prediction for virtual environments.

Contribution

The paper presents a novel approach to synthesize plausible interaction sounds in 3D scenes using a rectified flow model trained on hand action and sound data.

Findings

01

Generated sounds accurately reflect material properties.

02

Sounds are often indistinguishable from real recordings.

03

Method enables interactive sound prediction in 3D environments.

Abstract

We study the problem of making 3D scene reconstructions interactive by asking the following question: can we predict the sounds of human hands physically interacting with a scene? First, we record a video of a human manipulating objects within a 3D scene using their hands. We then use these action-sound pairs to train a rectified flow model to map 3D hand trajectories to their corresponding audio. At test time, a user can query the model for other actions, parameterized as sequences of hand poses, to estimate their corresponding sounds. In our experiments, we find that our generated sounds accurately convey material properties and actions, and that they are often indistinguishable to human observers from real sounds. Project page: https://www.yimingdou.com/hearing_hands/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dou-yiming/hearing_hands
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Face recognition and analysis