Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning

Julia Werner; Oliver Bause; Julius Oexle; Maxime Le Floch; Franz Brinkmann; Jochen Hampe; Oliver Bringmann

arXiv:2507.23479·cs.CV·January 21, 2026

Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning

Julia Werner, Oliver Bause, Julius Oexle, Maxime Le Floch, Franz Brinkmann, Jochen Hampe, Oliver Bringmann

PDF

TL;DR

This paper presents a multi-task neural network for video capsule endoscopy that simultaneously localizes within the gastrointestinal tract and detects anomalies, achieving high accuracy with a small model suitable for deployment on resource-constrained devices.

Contribution

The work introduces a novel multi-task model that combines localization and anomaly detection in capsule endoscopy with a limited parameter count, outperforming single-task models.

Findings

01

Achieves 93.63% localization accuracy

02

Achieves 87.48% anomaly detection accuracy

03

Uses only 1 million parameters, enabling deployment

Abstract

Video capsule endoscopy has become increasingly important for investigating the small intestine within the gastrointestinal tract. However, a persistent challenge remains the short battery lifetime of such compact sensor edge devices. Integrating artificial intelligence can help overcome this limitation by enabling intelligent real-time decision-making, thereby reducing the energy consumption and prolonging the battery life. However, this remains challenging due to data sparsity and the limited resources of the device restricting the overall model size. In this work, we introduce a multi-task neural network that combines the functionalities of precise self-localization within the gastrointestinal tract with the ability to detect anomalies in the small intestine within a single model. Throughout the development process, we consistently restricted the total number of parameters to ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.