TemporalDoRA: Temporal PEFT for Robust Surgical Video Question Answering
Luca Carlini, Chiara Lena, Cesare Hassan, Danail Stoyanov, Elena De Momi, Sophia Bano, Mobarak I. Hoque

TL;DR
TemporalDoRA introduces a temporally-aware PEFT method for surgical VideoQA, enhancing robustness to linguistic variation and improving temporal grounding by integrating lightweight temporal attention within the low-rank adaptation framework.
Contribution
It proposes a novel temporal PEFT approach that incorporates temporal attention into low-rank adaptation, improving VideoQA robustness and temporal grounding capabilities.
Findings
Improves Out-of-Template performance on surgical VideoQA datasets.
Enhances robustness to linguistic variation in question phrasing.
Validates effectiveness on multiple surgical VideoQA benchmarks.
Abstract
Surgical Video Question Answering (VideoQA) requires accurate temporal grounding while remaining robust to natural variation in how clinicians phrase questions, where linguistic bias can arise. Standard Parameter Efficient Fine Tuning (PEFT) methods adapt pretrained projections without explicitly modeling frame-to-frame interactions within the adaptation pathway, limiting their ability to exploit sparse temporal evidence. We introduce TemporalDoRA, a video-specific PEFT formulation that extends Weight-Decomposed Low-Rank Adaptation by (i) inserting lightweight temporal Multi-Head Attention (MHA) inside the low-rank bottleneck of the vision encoder and (ii) selectively applying weight decomposition only to the trainable low-rank branch rather than the full adapted weight. This design enables temporally-aware updates while preserving a frozen backbone and stable scaling. By mixing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
