Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting
Vishwanath A Sindagi, Vishal M. Patel

TL;DR
This paper introduces a novel multi-level feature fusion network with scale-aware training for improved crowd counting in highly congested scenes, outperforming recent methods across multiple datasets.
Contribution
It proposes a multi-level bottom-top and top-bottom fusion approach combined with scale complementary feature blocks and scale-aware ground-truth maps for effective crowd counting.
Findings
Outperforms recent crowd counting methods on three datasets.
Effectively handles scale variation in highly congested scenes.
Demonstrates significant accuracy improvements across multiple benchmarks.
Abstract
Crowd counting presents enormous challenges in the form of large variation in scales within images and across the dataset. These issues are further exacerbated in highly congested scenes. Approaches based on straightforward fusion of multi-scale features from a deep network seem to be obvious solutions to this problem. However, these fusion approaches do not yield significant improvements in the case of crowd counting in congested scenes. This is usually due to their limited abilities in effectively combining the multi-scale features for problems like crowd counting. To overcome this, we focus on how to efficiently leverage information present in different layers of the network. Specifically, we present a network that involves: (i) a multi-level bottom-top and top-bottom fusion (MBTTBF) method to combine information from shallower to deeper layers and vice versa at multiple levels, (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
