AI based signage classification for linguistic landscape studies
Yuqin Jiang, Song Jiang, Jacob Algrim, Trevor Harms, Maxwell Koenen, Xinya Lan, Xingyu Li, Chun-Han Lin, Jia Liu, Jiayang Sun, Henry Zenger

TL;DR
This paper investigates the application of AI-powered language detection and OCR to automate signage classification in urban linguistic landscape studies, demonstrating promising accuracy and highlighting the need for hybrid human-AI workflows.
Contribution
It introduces an AI-based method for signage classification in LL research, combining OCR and language detection, and evaluates its effectiveness on a real-world dataset.
Findings
AI model achieved 79% accuracy in signage classification
Identified common mislabeling issues such as distortion and graffiti
AI detects peripheral texts often ignored by humans
Abstract
Linguistic Landscape (LL) research traditionally relies on manual photography and annotation of public signages to examine distribution of languages in urban space. While such methods yield valuable findings, the process is time-consuming and difficult for large study areas. This study explores the use of AI powered language detection method to automate LL analysis. Using Honolulu Chinatown as a case study, we constructed a georeferenced photo dataset of 1,449 images collected by researchers and applied AI for optical character recognition (OCR) and language classification. We also conducted manual validations for accuracy checking. This model achieved an overall accuracy of 79%. Five recurring types of mislabeling were identified, including distortion, reflection, degraded surface, graffiti, and hallucination. The analysis also reveals that the AI model treats all regions of an image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
