Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

Rehana Mahfuz; Yinyi Guo; Erik Visser; Phanidhar Chinchili

arXiv:2602.15707·cs.MM·February 18, 2026

Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU

Rehana Mahfuz, Yinyi Guo, Erik Visser, Phanidhar Chinchili

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving, real-time conversational assistant for procedural tasks using audio and IMU data, featuring a novel finetuning method to improve dialogue relevance and efficiency.

Contribution

It presents a lightweight, edge-deployable assistant for manual tasks, with a new finetuning approach to enhance dialogue quality and reduce computational load.

Findings

01

>30% improvement in F-score for dialogue relevance

02

16x faster inference after finetuning

03

Effective on edge devices without cloud dependence

Abstract

Real-time conversational assistants for procedural tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides comprehensive guidance for a procedural task using only lightweight privacy-preserving modalities such as audio and IMU inputs from a user's wearable device to understand the context. This assistant proactively communicates step-by-step instructions to a user performing a furniture assembly task, and answers user questions. We construct a dataset containing conversations where the assistant guides the user in performing the task. On observing that an off-the-shelf language model is a very talkative assistant, we design a novel User Whim Agnostic (UWA) LoRA finetuning method which improves the model's ability to suppress less informative dialogues, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Multimodal Machine Learning Applications · Speech and dialogue systems