TL;DR
This paper introduces a multi-modal, knowledge-infused framework for clinical conversation summarization that incorporates visual data and medical knowledge to improve accuracy and relevance in telemedicine report generation.
Contribution
It proposes a novel multi-modal, multi-task framework with knowledge infusion and visual features, along with a new annotated dataset for clinical conversation summarization.
Findings
Visual information significantly improves summary quality.
Knowledge infusion enhances medical entity preservation.
Medical department identification correlates with summary accuracy.
Abstract
With the advancement of telemedicine, both researchers and medical practitioners are working hand-in-hand to develop various techniques to automate various medical operations, such as diagnosis report generation. In this paper, we first present a multi-modal clinical conversation summary generation task that takes a clinician-patient interaction (both textual and visual information) and generates a succinct synopsis of the conversation. We propose a knowledge-infused, multi-modal, multi-tasking medical domain identification and clinical conversation summary generation (MM-CliConSummation) framework. It leverages an adapter to infuse knowledge and visual features and unify the fused feature vector using a gated mechanism. Furthermore, we developed a multi-modal, multi-intent clinical conversation summarization corpus annotated with intent, symptom, and summary. The extensive set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdapter
