# A Trinocular System for Pedestrian Localization by Combining Template Matching with Geometric Constraint Optimization

**Authors:** Jinjing Zhao, Sen Huang, Yancheng Li, Jingjing Xu, Shengyong Xu

PMC · DOI: 10.3390/s25195970 · 2025-09-25

## TL;DR

This paper introduces a trinocular vision system that improves pedestrian localization accuracy and speed by combining template matching with geometric constraints.

## Contribution

A novel trinocular stereo vision framework that integrates template matching with geometric constraint optimization for improved pedestrian localization.

## Key findings

- The proposed trinocular system achieves a mean absolute error of 0.435 m in pedestrian localization.
- The system processes each target in 3.13 ms and maintains real-time performance for up to nine pedestrians.
- The method outperforms binocular approaches like Semi-Global Block Matching and RAFT-Stereo in accuracy.

## Abstract

Pedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a trinocular camera configuration arranged in an equilateral triangle, which enables complementary perspectives beyond a standard horizontal baseline. Based on this setup, an initial depth estimate is obtained through multi-scale template matching on the primary binocular pair. The additional vertical viewpoint is then incorporated by enforcing three-view geometric consistency, yielding refined and more reliable depth estimates. We evaluate the method on a custom outdoor trinocular dataset. Experimental results demonstrate that the proposed approach achieves a mean absolute error of 0.435 m with an average processing time of 3.13 ms per target. This performance surpasses both the binocular Semi-Global Block Matching (0.536 m) and RAFT-Stereo (0.623 m for the standard model and 0.621 m for the real-time model without fine-tuning). When combined with the YOLOv8-s detector, the system can localize pedestrians in 7.52 ms per frame, maintaining real-time operation (>30 Hz) for up to nine individuals, with a total end-to-end latency of approximately 32.56 ms.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), visually impaired (MESH:D014786)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12526798/full.md

---
Source: https://tomesphere.com/paper/PMC12526798