Two-Stream Binocular Network: Accurate Near Field Finger Detection Based On Binocular Images
Yi Wei, Guijin Wang, Cairong Zhang, Hengkai Guo, Xinghao Chen,, Huazhong Yang

TL;DR
This paper introduces TSBnet, a novel two-stream binocular network that directly detects fingertips from binocular images, utilizing shared and separate feature extraction layers and a binocular distance measurement layer, achieving high accuracy.
Contribution
The paper presents a new framework for direct fingertip detection from binocular images, including a binocular distance measurement layer and a large dataset for training and evaluation.
Findings
Achieved an average error of 10.9mm on test set.
Outperformed previous methods by 5.9mm (35.1% improvement).
Built a dataset with 117k training and 10k test image pairs.
Abstract
Fingertip detection plays an important role in human computer interaction. Previous works transform binocular images into depth images. Then depth-based hand pose estimation methods are used to predict 3D positions of fingertips. Different from previous works, we propose a new framework, named Two-Stream Binocular Network (TSBnet) to detect fingertips from binocular images directly. TSBnet first shares convolutional layers for low level features of right and left images. Then it extracts high level features in two-stream convolutional networks separately. Further, we add a new layer: binocular distance measurement layer to improve performance of our model. To verify our scheme, we build a binocular hand image dataset, containing about 117k pairs of images in training set and 10k pairs of images in test set. Our methods achieve an average error of 10.9mm on our test set, outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
