Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models
Bruno Bianchi, Aakash Agrawal, Stanislas Dehaene, Emmanuel Chemla,, Yair Lakretz

TL;DR
This paper investigates whether deep neural models can disentangle letter identity and position in written words, revealing current models' limitations compared to human capabilities and introducing a new benchmark for evaluation.
Contribution
The study introduces CompOrth, a novel benchmark for assessing disentanglement and compositionality in visual models of orthography, and evaluates the performance of beta-VAE models on this benchmark.
Findings
Models effectively disentangle surface features like retinal location.
Models fail to disentangle letter position from letter identity.
Models lack understanding of word length and compositional structure.
Abstract
Human readers can accurately count how many letters are in a word (e.g., 7 in ``buffalo''), remove a letter from a given position (e.g., ``bufflo'') or add a new one. The human brain of readers must have therefore learned to disentangle information related to the position of a letter and its identity. Such disentanglement is necessary for the compositional, unbounded, ability of humans to create and parse new strings, with any combination of letters appearing in any positions. Do modern deep neural models also possess this crucial compositional ability? Here, we tested whether neural models that achieve state-of-the-art on disentanglement of features in visual input can also disentangle letter position and letter identity when trained on images of written words. Specifically, we trained beta variational autoencoder (-VAE) to reconstruct images of letter strings and evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Handwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsSparse Evolutionary Training
