The boundaries of meaning: a case study in neural machine translation
Yuri Balashov

TL;DR
This paper investigates the linguistic and philosophical implications of subword segmentation algorithms in neural machine translation, questioning their transparency and semantic plausibility within language modeling.
Contribution
It provides a critical analysis of subword segmentation methods, relating them to linguistic and philosophical debates to enhance AI transparency and understanding.
Findings
Subword algorithms often produce semantically opaque segments.
Translation can be viewed as mapping sequences of subword segments.
The linguistic plausibility of subword-based translation is questioned.
Abstract
The success of deep learning in natural language processing raises intriguing questions about the nature of linguistic meaning and ways in which it can be processed by natural and artificial systems. One such question has to do with subword segmentation algorithms widely employed in language modeling, machine translation, and other tasks since 2016. These algorithms often cut words into semantically opaque pieces, such as 'period', 'on', 't', and 'ist' in 'period|on|t|ist'. The system then represents the resulting segments in a dense vector space, which is expected to model grammatical relations among them. This representation may in turn be used to map 'period|on|t|ist' (English) to 'par|od|ont|iste' (French). Thus, instead of being modeled at the lexical level, translation is reformulated more generally as the task of learning the best bilingual mapping between the sequences of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
