Structured Label Inference for Visual Understanding

Nelson Nauata; Hexiang Hu; Guang-Tong Zhou; Zhiwei Deng; Zicheng Liao,; Greg Mori

arXiv:1802.06459·cs.CV·February 20, 2018·6 cites

Structured Label Inference for Visual Understanding

Nelson Nauata, Hexiang Hu, Guang-Tong Zhou, Zhiwei Deng, Zicheng Liao,, Greg Mori

PDF

Open Access 1 Repo

TL;DR

This paper introduces graph-based structured label inference methods, including BINN, SINN, and an LSTM extension, to improve multi-label image and video classification and action detection by exploiting label relationships.

Contribution

It proposes novel graph-based inference models and an LSTM extension for leveraging label structure and activity progression in visual understanding tasks.

Findings

01

Significant accuracy improvements over baseline methods.

02

Effective modeling of label relationships enhances classification.

03

LSTM extension captures activity progression in videos.

Abstract

Visual data such as images and videos contain a rich source of structured semantic labels as well as a wide range of interacting components. Visual content could be assigned with fine-grained labels describing major components, coarse-grained labels depicting high level abstractions, or a set of labels revealing attributes. Such categorization over different, interacting layers of labels evinces the potential for a graph-based encoding of label information. In this paper, we exploit this rich structure for performing graph-based inference in label space for a number of tasks: multi-label image and video classification and action detection in untrimmed videos. We consider the use of the Bidirectional Inference Neural Network (BINN) and Structured Inference Neural Network (SINN) for performing graph-based inference in label space and propose a Long Short-Term Memory (LSTM) based extension…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daveboat/structured_label_inference
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning