Linguistic unit discovery from multi-modal inputs in unwritten   languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Odette Scharenborg; Laurent Besacier; Alan Black; Mark; Hasegawa-Johnson; Florian Metze; Graham Neubig; Sebastian Stueker; Pierre; Godard; Markus Mueller; Lucas Ondel; Shruti Palaskar; Philip Arthur,; Francesco Ciannella; Mingxing Du; Elin Larsen; Danny Merkx; Rachid Riad,; Liming Wang; Emmanuel Dupoux

arXiv:1802.05092·cs.CL·February 15, 2018

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Odette Scharenborg, Laurent Besacier, Alan Black, Mark, Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre, Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur,, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad,

PDF

TL;DR

This paper summarizes a workshop focused on computational methods for discovering linguistic units in unwritten languages using multi-modal inputs like images and translated text to replace traditional orthographic transcriptions.

Contribution

It presents a multidisciplinary approach to unsupervised linguistic unit discovery leveraging multi-modal data, advancing methods for unwritten language analysis.

Findings

01

Exploration of multi-modal data for linguistic unit discovery

02

Unsupervised methods for subword and word identification

03

Potential of images and translated text to replace orthography

Abstract

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.