Proactive Conversational Assistant for a Procedural Manual Task based on Audio and IMU
Rehana Mahfuz, Yinyi Guo, Erik Visser, Phanidhar Chinchili

TL;DR
This paper introduces a privacy-preserving, real-time conversational assistant for procedural tasks using audio and IMU data, featuring a novel finetuning method to improve dialogue relevance and efficiency.
Contribution
It presents a lightweight, edge-deployable assistant for manual tasks, with a new finetuning approach to enhance dialogue quality and reduce computational load.
Findings
>30% improvement in F-score for dialogue relevance
16x faster inference after finetuning
Effective on edge devices without cloud dependence
Abstract
Real-time conversational assistants for procedural tasks often depend on video input, which can be computationally expensive and compromise user privacy. For the first time, we propose a real-time conversational assistant that provides comprehensive guidance for a procedural task using only lightweight privacy-preserving modalities such as audio and IMU inputs from a user's wearable device to understand the context. This assistant proactively communicates step-by-step instructions to a user performing a furniture assembly task, and answers user questions. We construct a dataset containing conversations where the assistant guides the user in performing the task. On observing that an off-the-shelf language model is a very talkative assistant, we design a novel User Whim Agnostic (UWA) LoRA finetuning method which improves the model's ability to suppress less informative dialogues, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Multimodal Machine Learning Applications · Speech and dialogue systems
