Location-Sensitive Visual Recognition with Cross-IOU Loss

Kaiwen Duan; Lingxi Xie; Honggang Qi; Song Bai; Qingming Huang; Qi; Tian

arXiv:2104.04899·cs.CV·April 13, 2021·27 cites

Location-Sensitive Visual Recognition with Cross-IOU Loss

Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang, Qi, Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces LSNet, a unified deep learning framework for location-sensitive visual recognition tasks like detection and segmentation, utilizing a novel cross-IOU loss to improve accuracy across scales.

Contribution

It proposes LSNet, a unified network predicting anchor points and landmarks, with a new cross-IOU loss for better scale fitting and contextual understanding.

Findings

01

Achieved state-of-the-art 53.5% box AP on MS-COCO

02

Set new 40.2% mask AP for instance segmentation

03

Demonstrated effective multi-scale human pose detection

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor point-landmark pair to approximate the global IOU between the prediction and ground-truth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Duankaiwen/LSNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques