Longer Version for "Deep Context-Encoding Network for Retinal Image   Captioning"

Jia-Hong Huang; Ting-Wei Wu; Chao-Han Huck Yang; Marcel Worring

arXiv:2105.14538·cs.CV·June 1, 2021·1 cites

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring

PDF

Open Access

TL;DR

This paper introduces a novel context-driven encoding network that effectively integrates image and keyword information to generate accurate and meaningful medical reports for retinal images, outperforming existing models.

Contribution

A new multi-modal encoder-decoder model that leverages interactive image and keyword information for improved retinal image report generation.

Findings

01

Achieves state-of-the-art performance on medical report metrics

02

Improves BLEU-avg by 16%, CIDEr by 10.2%, ROUGE by 8.6%

03

Effectively leverages image and keyword interaction

Abstract

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency. In this work, we propose a new context-driven encoding network to automatically generate medical reports for retinal images. The proposed model is mainly composed of a multi-modal input encoder and a fused-feature decoder. Our experimental results show that our proposed method is capable of effectively leveraging the interactive information between the input image and context, i.e., keywords in our case. The proposed method creates more accurate and meaningful reports for retinal images than baseline models and achieves state-of-the-art performance. This performance is shown in several commonly used metrics for the medical report generation task: BLEU-avg (+16%), CIDEr (+10.2%), and ROUGE (+8.6%).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques