TL;DR
Butter is a novel object detection framework for autonomous driving that enhances hierarchical feature consistency and fusion, leading to improved accuracy and efficiency in detecting pedestrians, vehicles, and traffic signs.
Contribution
It introduces FAFCE and PHFFNet modules to refine multi-scale features and integrate hierarchical information, addressing feature inconsistency and semantic gaps in existing detectors.
Findings
Outperforms existing methods on BDD100K, KITTI, and Cityscapes datasets.
Improves detection accuracy while reducing model complexity.
Achieves a balance between robustness and computational efficiency.
Abstract
Hierarchical feature representations play a pivotal role in computer vision, particularly in object detection for autonomous driving. Multi-level semantic understanding is crucial for accurately identifying pedestrians, vehicles, and traffic signs in dynamic environments. However, existing architectures, such as YOLO and DETR, struggle to maintain feature consistency across different scales while balancing detection precision and computational efficiency. To address these challenges, we propose Butter, a novel object detection framework designed to enhance hierarchical feature representations for improving detection robustness. Specifically, Butter introduces two key innovations: Frequency-Adaptive Feature Consistency Enhancement (FAFCE) Component, which refines multi-scale feature consistency by leveraging adaptive frequency filtering to enhance structural and boundary precision, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
