QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention   Procedures from First-Person View

Trinh T. L. Vuong; Doanh C. Bui; Jin Tae Kwak

arXiv:2407.13216·cs.CV·July 19, 2024

QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View

Trinh T. L. Vuong, Doanh C. Bui, Jin Tae Kwak

PDF

Open Access 1 Repo

TL;DR

This paper presents automated solutions for life-saving intervention tasks in the T3 Challenge, including action recognition, anticipation, and VQA, achieving top ranks in the competition.

Contribution

It introduces a novel pre-processing strategy, action dictionary-guided training, and a frame-question cross-attention mechanism for improved performance.

Findings

01

Achieved 2nd place in action recognition and anticipation.

02

Achieved 1st place in Visual Question Answering.

03

Proposed effective knowledge distillation and attention mechanisms.

Abstract

In this paper, we present our solutions for a spectrum of automation tasks in life-saving intervention procedures within the Trauma THOMPSON (T3) Challenge, encompassing action recognition, action anticipation, and Visual Question Answering (VQA). For action recognition and anticipation, we propose a pre-processing strategy that samples and stitches multiple inputs into a single image and then incorporates momentum- and attention-based knowledge distillation to improve the performance of the two tasks. For training, we present an action dictionary-guided design, which consistently yields the most favorable results across our experiments. In the realm of VQA, we leverage object-level features and deploy co-attention networks to train both object and question features. Notably, we introduce a novel frame-question cross-attention mechanism at the network's core for enhanced performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

quiil/quiil_thompson_solution
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing

MethodsKnowledge Distillation