# Unsupervised Action Proposal Ranking through Proposal Recombination

**Authors:** Waqas Sultani, Dong Zhang, Mubarak Shah

arXiv: 1704.00758 · 2017-04-05

## TL;DR

This paper introduces an unsupervised method for ranking and selecting high-quality action proposals in videos by recombining sub-proposals using graph optimization, web image-based actionness, and motion cues, improving action detection.

## Contribution

It presents a novel unsupervised approach that ranks action proposals without requiring manual annotations, leveraging web images and motion analysis for better proposal quality.

## Key findings

- Outperforms existing proposal ranking methods on multiple datasets.
- Improves action detection accuracy significantly.
- Does not require bounding box annotations or video labels.

## Abstract

Recently, action proposal methods have played an important role in action recognition tasks, as they reduce the search space dramatically. Most unsupervised action proposal methods tend to generate hundreds of action proposals which include many noisy, inconsistent, and unranked action proposals, while supervised action proposal methods take advantage of predefined object detectors (e.g., human detector) to refine and score the action proposals, but they require thousands of manual annotations to train.   Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly. In our approach, we first divide action proposal into sub-proposal and then use Dynamic Programming based graph optimization scheme to select the optimal combinations of sub-proposals from different proposals and assign each new proposal a score. We propose a new unsupervised image-based actioness detector that leverages web images and employs it as one of the node scores in our graph formulation. Moreover, we capture motion information by estimating the number of motion contours within each action proposal patch. The proposed method is an unsupervised method that neither needs bounding box annotations nor video level labels, which is desirable with the current explosion of large-scale action datasets. Our approach is generic and does not depend on a specific action proposal method. We evaluate our approach on several publicly available trimmed and un-trimmed datasets and obtain better performance compared to several proposal ranking methods. In addition, we demonstrate that properly ranked proposals produce significantly better action detection as compared to state-of-the-art proposal based methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.00758/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1704.00758/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/1704.00758/full.md

---
Source: https://tomesphere.com/paper/1704.00758