ViT Cane: Visual Assistant for the Visually Impaired

Bhavesh Kumar

arXiv:2109.13857·cs.CV·September 29, 2021·1 cites

ViT Cane: Visual Assistant for the Visually Impaired

Bhavesh Kumar

PDF

Open Access

TL;DR

ViT Cane is a real-time obstacle detection system for the visually impaired using a vision transformer model, demonstrating improved performance over CNN models and tested in real-world scenarios.

Contribution

The paper introduces a novel obstacle detection system for the visually impaired utilizing a vision transformer, outperforming CNN models and designed for easy reproduction.

Findings

01

Higher performance on COCO dataset compared to CNN models

02

Effective obstacle avoidance demonstrated in field tests

03

System is portable and easily reproducible

Abstract

Blind and visually challenged face multiple issues with navigating the world independently. Some of these challenges include finding the shortest path to a destination and detecting obstacles from a distance. To tackle this issue, this paper proposes ViT Cane, which leverages a vision transformer model in order to detect obstacles in real-time. Our entire system consists of a Pi Camera Module v2, Raspberry Pi 4B with 8GB Ram and 4 motors. Based on tactile input using the 4 motors, the obstacle detection model is highly efficient in helping visually impaired navigate unknown terrain and is designed to be easily reproduced. The paper discusses the utility of a Visual Transformer model in comparison to other CNN based models for this specific application. Through rigorous testing, the proposed obstacle detection model has achieved higher performance on the Common Object in Context (COCO)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions · Smart Parking Systems Research · Gaze Tracking and Assistive Technology

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Dense Connections · Byte Pair Encoding · Label Smoothing