Learning Multilingual Word Representations using a Bag-of-Words Autoencoder
Stanislas Lauly, Alex Boulanger, Hugo Larochelle

TL;DR
This paper introduces a novel autoencoder approach for learning multilingual word representations without relying on word-level alignments, demonstrating competitive performance in cross-lingual document classification tasks.
Contribution
It presents a new autoencoder model that learns multilingual word embeddings without the need for word-level alignments, simplifying the process.
Findings
Outperforms alignment-based methods in multilingual classification
Effective in low-resource language scenarios
Simplifies multilingual embedding learning process
Abstract
Recent work on learning multilingual word representations usually relies on the use of word-level alignements (e.g. infered with the help of GIZA++) between translated sentences, in order to align the word embeddings in different languages. In this workshop paper, we investigate an autoencoder model for learning multilingual word representations that does without such word-level alignements. The autoencoder is trained to reconstruct the bag-of-word representation of given sentence from an encoded representation extracted from its translation. We evaluate our approach on a multilingual document classification task, where labeled data is available only for one language (e.g. English) while classification must be performed in a different language (e.g. French). In our experiments, we observe that our method compares favorably with a previously proposed method that exploits word-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsSolana Customer Service Number +1-833-534-1729
