MathReader : Text-to-Speech for Mathematical Documents
Sieun Hyeon, Kyudan Jung, Nam-Joon Kim, Hyun Gon Ryu, Jaeyoung Do

TL;DR
MathReader is a novel text-to-speech system that accurately reads mathematical documents by integrating OCR, a fine-tuned T5 model, and TTS, significantly reducing errors in mathematical content pronunciation.
Contribution
It introduces MathReader, a new approach combining OCR and language modeling to improve TTS accuracy for mathematical documents, outperforming existing solutions.
Findings
Lower Word Error Rate (WER) compared to Microsoft Edge and Adobe Acrobat.
Reduced WER from 0.510 to 0.281 over Microsoft Edge.
Reduced WER from 0.617 to 0.281 over Adobe Acrobat.
Abstract
TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Mathematics Education and Programs · History and Theory of Mathematics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Gated Linear Unit · Residual Connection · Dropout · SentencePiece · Softmax · Linear Layer · Attention Is All You Need · Inverse Square Root Schedule
