The Multilingual Amazon Reviews Corpus
Phillip Keung, Yichao Lu, Gy\"orgy Szarvas, Noah A. Smith

TL;DR
This paper introduces the Multilingual Amazon Reviews Corpus, a large, balanced dataset across six languages for multilingual text classification, and evaluates baseline models including zero-shot transfer with multilingual BERT.
Contribution
The paper provides a new multilingual review dataset with balanced ratings and benchmarks for supervised and zero-shot cross-lingual classification tasks.
Findings
Baseline results for supervised classification in multiple languages.
Zero-shot transfer learning performance using multilingual BERT.
Use of MAE as a metric for ordinal rating prediction.
Abstract
We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification. The corpus contains reviews in English, Japanese, German, French, Spanish, and Chinese, which were collected between 2015 and 2019. Each record in the dataset contains the review text, the review title, the star rating, an anonymized reviewer ID, an anonymized product ID, and the coarse-grained product category (e.g., 'books', 'appliances', etc.) The corpus is balanced across the 5 possible star ratings, so each rating constitutes 20% of the reviews in each language. For each language, there are 200,000, 5,000, and 5,000 reviews in the training, development, and test sets, respectively. We report baseline results for supervised text classification and zero-shot cross-lingual transfer learning by fine-tuning a multilingual BERT model on reviews data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Layer Normalization · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Attention Is All You Need
