Let Video Teaches You More: Video-to-Image Knowledge Distillation using   DEtection TRansformer for Medical Video Lesion Detection

Yuncheng Jiang; Zixun Zhang; Jun Wei; Chun-Mei Feng; Guanbin Li; Xiang; Wan; Shuguang Cui; Zhen Li

arXiv:2408.14051·cs.CV·August 27, 2024

Let Video Teaches You More: Video-to-Image Knowledge Distillation using DEtection TRansformer for Medical Video Lesion Detection

Yuncheng Jiang, Zixun Zhang, Jun Wei, Chun-Mei Feng, Guanbin Li, Xiang, Wan, Shuguang Cui, Zhen Li

PDF

Open Access

TL;DR

This paper introduces V2I-DETR, a novel video-to-image knowledge distillation method that captures temporal context from videos to improve medical lesion detection while maintaining real-time inference speed.

Contribution

It proposes a teacher-student framework that distills video temporal information into image-based models, enhancing accuracy without sacrificing speed.

Findings

01

Outperforms previous state-of-the-art methods significantly.

02

Achieves real-time inference at 30 FPS.

03

Effectively combines video context with image model efficiency.

Abstract

AI-assisted lesion detection models play a crucial role in the early screening of cancer. However, previous image-based models ignore the inter-frame contextual information present in videos. On the other hand, video-based models capture the inter-frame context but are computationally expensive. To mitigate this contradiction, we delve into Video-to-Image knowledge distillation leveraging DEtection TRansformer (V2I-DETR) for the task of medical video lesion detection. V2I-DETR adopts a teacher-student network paradigm. The teacher network aims at extracting temporal contexts from multiple frames and transferring them to the student network, and the student network is an image-based model dedicated to fast prediction in inference. By distilling multi-frame contexts into a single frame, the proposed V2I-DETR combines the advantages of utilizing temporal contexts from video-based models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · COVID-19 diagnosis using AI · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation