Deep Interactive Region Segmentation and Captioning

Ali Sharifi Boroujerdi; Maryam Khanian; Michael Breuss

arXiv:1707.08364·cs.CV·July 27, 2017

Deep Interactive Region Segmentation and Captioning

Ali Sharifi Boroujerdi, Maryam Khanian, Michael Breuss

PDF

TL;DR

This paper introduces a hybrid deep learning system that allows users to specify regions in images for targeted segmentation and captioning, improving interpretability and accuracy over existing methods.

Contribution

It presents a novel interactive segmentation and captioning architecture combining a specialized FCN and dense captioning, enabling user-guided region processing.

Findings

01

Outperforms state-of-the-art interactive segmentation methods

02

Enhances understanding of dense captioning outputs

03

Improves object detection accuracy with segmentation-based region focus

Abstract

With recent innovations in dense image captioning, it is now possible to describe every object of the scene with a caption while objects are determined by bounding boxes. However, interpretation of such an output is not trivial due to the existence of many overlapping bounding boxes. Furthermore, in current captioning frameworks, the user is not able to involve personal preferences to exclude out of interest areas. In this paper, we propose a novel hybrid deep learning architecture for interactive region segmentation and captioning where the user is able to specify an arbitrary region of the image that should be processed. To this end, a dedicated Fully Convolutional Network (FCN) named Lyncean FCN (LFCN) is trained using our special training data to isolate the User Intention Region (UIR) as the output of an efficient segmentation. In parallel, a dense image captioning model is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMax Pooling · Convolution · Fully Convolutional Network