Interactive Multi-Turn Retrieval for Health Videos
Chengzheng Wu, Ke Qiu, Baoming Zhang, Ruiyu Mao, Xulong Tang, and Kaixing Yang

TL;DR
This paper introduces an interactive multi-turn retrieval system for health videos, improving retrieval accuracy by incorporating follow-up query refinements and establishing a new benchmark dataset.
Contribution
It proposes DATR, a two-stage retrieval framework that enhances health video search through multi-turn query fusion and efficient coarse-to-fine retrieval, along with a new dataset MHVRC.
Findings
DATR outperforms strong baselines in retrieval accuracy.
User studies show multi-turn queries better capture procedural semantics.
MHVRC provides a new benchmark for health video retrieval.
Abstract
The growing availability of health-related instructional videos creates new opportunities for clinical training, patient rehabilitation, and health education, yet existing retrieval systems remain largely single-turn: a user submits one query and receives one ranked list. This interaction is brittle in health scenarios, where information needs are often vague at first and become clinically meaningful only after follow-up constraints such as posture, hand placement, contraindications, equipment, or patient condition are specified. We introduce interactive multi-turn semantic retrieval for health videos and construct MHVRC, a Multi-Turn Health Video Retrieval Corpus, by combining video-grounded descriptions from VideoChat-Flash with query refinements generated by DeepSeek. We further propose DATR, a Dialogue-Aware Two-Stage Retrieval framework. DATR first performs efficient coarse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
