TL;DR
This paper introduces a mixture density network framework for object detection and human pose estimation, improving accuracy and interpretability by dividing input space into meaningful modes, reducing the need for multi-scale testing.
Contribution
It presents a novel spatial regression framework using mixture density networks tailored for object detection and pose estimation, enhancing performance and interpretability.
Findings
Higher accuracy in object detection and pose estimation.
Mixture components align with object scale and viewpoint.
No mode collapse observed in experiments.
Abstract
Mixture models are well-established learning approaches that, in computer vision, have mostly been applied to inverse or ill-defined problems. However, they are general-purpose divide-and-conquer techniques, splitting the input space into relatively homogeneous subsets in a data-driven manner. Not only ill-defined but also well-defined complex problems should benefit from them. To this end, we devise a framework for spatial regression using mixture density networks. We realize the framework for object detection and human pose estimation. For both tasks, a mixture model yields higher accuracy and divides the input space into interpretable modes. For object detection, mixture components focus on object scale, with the distribution of components closely following that of ground truth the object scale. This practically alleviates the need for multi-scale testing, providing a superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Mixture Dense Regression for Object Detection and Human Pose Estimation· youtube
