Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning
Julia Werner, Oliver Bause, Julius Oexle, Maxime Le Floch, Franz Brinkmann, Jochen Hampe, Oliver Bringmann

TL;DR
This paper presents a multi-task neural network for video capsule endoscopy that simultaneously localizes within the gastrointestinal tract and detects anomalies, achieving high accuracy with a small model suitable for deployment on resource-constrained devices.
Contribution
The work introduces a novel multi-task model that combines localization and anomaly detection in capsule endoscopy with a limited parameter count, outperforming single-task models.
Findings
Achieves 93.63% localization accuracy
Achieves 87.48% anomaly detection accuracy
Uses only 1 million parameters, enabling deployment
Abstract
Video capsule endoscopy has become increasingly important for investigating the small intestine within the gastrointestinal tract. However, a persistent challenge remains the short battery lifetime of such compact sensor edge devices. Integrating artificial intelligence can help overcome this limitation by enabling intelligent real-time decision-making, thereby reducing the energy consumption and prolonging the battery life. However, this remains challenging due to data sparsity and the limited resources of the device restricting the overall model size. In this work, we introduce a multi-task neural network that combines the functionalities of precise self-localization within the gastrointestinal tract with the ability to detect anomalies in the small intestine within a single model. Throughout the development process, we consistently restricted the total number of parameters to ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
