Vision-Braille: A Curriculum Learning Toolkit and Braille-Chinese Corpus for Braille Translation

Alan Wu; Ye Yuan; Zhiping Xiao; Ming Zhang

arXiv:2407.06048·cs.CL·April 21, 2026

Vision-Braille: A Curriculum Learning Toolkit and Braille-Chinese Corpus for Braille Translation

Alan Wu, Ye Yuan, Zhiping Xiao, Ming Zhang

PDF

1 Repo 1 Models 3 Datasets

TL;DR

Vision-Braille introduces an end-to-end Chinese Braille translation system from images, utilizing curriculum learning and synthetic data to handle tone omission and resource scarcity.

Contribution

It is the first publicly available system combining OCR and fine-tuned language models for Chinese Braille translation with a novel curriculum learning approach.

Findings

01

Achieves 83.28 BLEU on passage-level translation with 10% tone retention.

02

Constructs a synthetic Braille-Chinese corpus including tone-omission variants.

03

Demonstrates practical application for inclusive education for visually impaired students.

Abstract

We present Vision-Braille, the first publicly available end-to-end system for translating Chinese Braille extracted from images into written Chinese. This system addresses the unique challenges of limited annotated resources and tone omission. It integrates a robust Braille OCR pipeline with an LLM fine-tuned for sequence-to-sequence translation. We construct a synthetic Braille-Chinese corpus, including tone-omission variants that mimic authentic Braille writing habits. We fine-tune the model using a four-stage curriculum: starting with sentence-level data with full tone markers, progressing to passage-level data, then applying a tone-omission schedule of decreasing retention, and finally consolidating on passages with heavy tone omission. On passage-level translation with 10\% tone retention, \methodname{} achieves 83.28 BLEU. Vision-Braille offers an inclusive NLP solution that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/EMNLP_2026_Supp_Code_Data-2F6D
github

Models

🤗
Violet-yo/mt5-small-ft-Chinese-Braille
model· 20 dl· ♡ 1
20 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.