Before Smelling the Video: A Two-Stage Pipeline for Interpretable Video-to-Scent Plans
Kaicheng Wang, Kevin Zhongyang Shao, Ruiqi Chen, Sep Makhsous, Denise Wilson

TL;DR
This paper introduces a two-stage video-to-scent planning system that uses semantic extraction and language models to generate understandable and perceptually aligned scent plans for videos, enhancing immersive media experiences.
Contribution
It proposes a novel two-stage pipeline combining vision-language and language models for semantic scent planning, addressing scalability and interpretability issues in olfactory media.
Findings
Participants preferred system-generated scent plans over baselines.
Plans prioritized perceptually salient cues and aligned with visible actions.
Semantic planning was shown to be effective for olfactory media design.
Abstract
Olfactory cues can enhance immersion in interactive media, yet smell remains rare because it is difficult to author and synchronize with dynamic video. Prior olfactory interfaces rely on designer triggers and fixed event-to-odor mappings that do not scale to unconstrained content. This work examines whether semantic planning for smell is intelligible to people before physical scent delivery. We present a video-to-scent planning pipeline that separates visual semantic extraction using a vision-language model from semantic-to-olfactory inference using a large language model. Two survey studies compare system-generated scent plans with over-inclusive and naive baselines. Results show consistent preference for plans that prioritize perceptually salient cues and align scent changes with visible actions, supporting semantic planning as a foundation for future olfactory media systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOlfactory and Sensory Function Studies · Visual Attention and Saliency Detection · Multisensory perception and integration
