# Let’s Go Bananas: Beyond Bounding Box Representations for Fisheye Camera-Based Object Detection in Autonomous Driving

**Authors:** Senthil Yogamani, Ganesh Sistu, Patrick Denny, Jane Courtney

PMC · DOI: 10.3390/s25123735 · Sensors (Basel, Switzerland) · 2025-06-14

## TL;DR

This paper explores new object detection methods for fisheye cameras in autonomous driving, introducing curved bounding boxes to handle image distortion.

## Contribution

The paper introduces a novel curved box representation and a camera geometry tensor to improve object detection in fisheye images.

## Key findings

- A curvature-adaptive polygon improved detection accuracy by 3 mAP points.
- Curved box representations outperformed standard and oriented bounding boxes by 3 and 1.6 mAP points, respectively.
- The camera geometry tensor further boosted performance by 1.4 mAP points.

## Abstract

Object detection is a mature problem in autonomous driving, with pedestrian detection being one of the first commercially deployed algorithms. It has been extensively studied in the literature. However, object detection is relatively less explored for fisheye cameras used for surround-view near-field sensing. The standard bounding-box representation fails in fisheye cameras due to heavy radial distortion, particularly in the periphery. In this paper, a generic object detection framework is implemented using the base YOLO (You Only Look Once) detector to systematically explore various object representations using the public WoodScape dataset. First, we implement basic representations, namely the standard bounding box, the oriented bounding box, and the ellipse. Secondly, we implement a generic polygon and propose a novel curvature-adaptive polygon, which obtains an improvement of 3 mAP (mean average precision) points. A polygon is expensive to annotate and complex to use in downstream tasks; thus, it is not practical to use it in real-world applications. However, we utilize it to demonstrate that the accuracy gap between the polygon and the bounding box representation is very high due to strong distortion in fisheye cameras. This motivates the design of a distortion-aware optimal representation of the bounding box for fisheye images, which tend to be banana-shaped near the periphery. We derive a novel representation called a curved box and improve it further by leveraging vanishing-point constraints. The proposed curved box representations outperform the bounding box by 3 mAP points and the oriented bounding box by 1.6 mAP points. In addition, the camera geometry tensor is formulated to provide adaptation to non-linear fisheye camera distortion characteristics and improves the performance further by 1.4 mAP points.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), bleeding (MESH:D006470), banana (MESH:C000721327)
- **Chemicals:** IoU (-)
- **Species:** Musa acuminata (banana, species) [taxon 4641], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12196831/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12196831/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/PMC12196831/full.md

---
Source: https://tomesphere.com/paper/PMC12196831