NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual   Prediction of Human Reading Behavior in Universal Language Space

Joseph Marvin Imperial

arXiv:2202.10855·cs.CL·March 1, 2022

NU HLT at CMCL 2022 Shared Task: Multilingual and Crosslingual Prediction of Human Reading Behavior in Universal Language Space

Joseph Marvin Imperial

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multilingual and crosslingual reading time prediction model using universal language representation via IPA, achieving improved accuracy by leveraging phonological features and diverse predictors.

Contribution

It presents the first unified model for multilingual and crosslingual reading time prediction utilizing IPA-based preprocessing and a comprehensive feature set.

Findings

01

Best MAE scores of 3.8031 and 3.9065 for FFDAvg and TRTAvg.

02

Unified approach outperforms previous models in multilingual reading prediction.

03

Utilizes phonological properties for crosslingual transfer learning.

Abstract

In this paper, we present a unified model that works for both multilingual and crosslingual prediction of reading times of words in various languages. The secret behind the success of this model is in the preprocessing step where all words are transformed to their universal language representation via the International Phonetic Alphabet (IPA). To the best of our knowledge, this is the first study to favorable exploit this phonological property of language for the two tasks. Various feature types were extracted covering basic frequencies, n-grams, information theoretic, and psycholinguistically-motivated predictors for model training. A finetuned Random Forest model obtained best performance for both tasks with 3.8031 and 3.9065 MAE scores for mean first fixation duration (FFDAvg) and mean total reading time (TRTAvg) respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imperialite/cmcl2022-unified-eye-tracking-ipa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques

MethodsMasked autoencoder