Automatic Video Object Segmentation via Motion-Appearance-Stream Fusion and Instance-aware Segmentation
Sungkwon Choo, Wonkyo Seo, Nam Ik Cho

TL;DR
This paper introduces a novel automatic video object segmentation method that fuses motion and appearance streams at original resolution with instance-aware segmentation, achieving state-of-the-art results without user intervention.
Contribution
The paper proposes a new two-stream fusion network with a recurrent multiscale structure that effectively combines motion and appearance information at original resolution for improved segmentation accuracy.
Findings
Achieves state-of-the-art performance in automatic video object segmentation.
Produces near semi-supervised segmentation quality.
Operates without any user intervention.
Abstract
This paper presents a method for automatic video object segmentation based on the fusion of motion stream, appearance stream, and instance-aware segmentation. The proposed scheme consists of a two-stream fusion network and an instance segmentation network. The two-stream fusion network again consists of motion and appearance stream networks, which extract long-term temporal and spatial information, respectively. Unlike the existing two-stream fusion methods, the proposed fusion network blends the two streams at the original resolution for obtaining accurate segmentation boundary. We develop a recurrent bidirectional multiscale structure with skip connection for the stream fusion network to extract long-term temporal information. Also, the multiscale structure enables to obtain the original resolution features at the end of the network. As a result of two-stream fusion, we have a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
