FlowDet: Unifying Object Detection and Generative Transport Flows
Enis Baty, C. P. Bridges, Simon Hadfield

TL;DR
FlowDet introduces a unified object detection framework using conditional flow matching, enabling faster, more flexible detection with improved accuracy over previous diffusion-based methods.
Contribution
It generalizes diffusion-based detection to a broader class of generative transport problems, allowing variable inference steps without re-training and improving detection performance.
Findings
Outperforms diffusion-based detection systems on COCO and LVIS datasets.
Enables variable inference steps without re-training.
Achieves up to +3.6% AP and +4.2% AP_{rare} improvements.
Abstract
We present FlowDet, the first formulation of object detection using modern Conditional Flow Matching techniques. This work follows from DiffusionDet, which originally framed detection as a generative denoising problem in the bounding box space via diffusion. We revisit and generalise this formulation to a broader class of generative transport problems, while maintaining the ability to vary the number of boxes and inference steps without re-training. In contrast to the curved stochastic transport paths induced by diffusion, FlowDet learns simpler and straighter paths resulting in faster scaling of detection performance as the number of inference steps grows. We find that this reformulation enables us to outperform diffusion based detection systems (as well as non-generative baselines) across a wide range of experiments, including various precision/recall operating points using multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis
