Corpus non align\'es et ADT. Essai de comparaison entre les pr\'esidents fran\c{c}ais et br\'esiliens de l'\`ere contemporaine
Carlos Maciel, Damon Mayaffre, Laurent Vanni

TL;DR
This paper investigates whether an ADT method can handle non-aligned bilingual corpora and examines if textual genre makes speeches comparable across languages, using a large corpus of French and Brazilian presidential speeches from 1950-2020.
Contribution
It proposes a methodological approach from frequency dictionaries to factorial analysis to compare presidential speeches across languages and genres.
Findings
ADT can be adapted for non-aligned corpora
Genre influences speech comparability across languages
A large bilingual corpus enables cross-national discourse analysis
Abstract
Is there an ADT method that can deal with non-aligned bilingual corpora? Does the textual genre exert a sufficiently strong constraint on the discourse that would make texts written in different languages comparable, provided they are of identical genre? To answer these two questions, one methodological, the other linguistic, this contribution gathers in a single corpus French and Brazilian presidential speeches of the contemporary era (1950-2020), from de Gaulle to Macron, from Kubitschek to Lula, i.e. 15 million words. A methodological path is proposed from the simple frequency dictionary to the factorial treatment of the cooccurrencial profiles of words, in order to establish a generic transnational presidential speech.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics and Discourse Analysis · Linguistic Studies and Language Acquisition · linguistics and terminology studies
