Difficult for Whom? A Study of Japanese Lexical Complexity
Adam Nohejl, Akio Hayakawa, Yusuke Ide, Taro Watanabe

TL;DR
This study investigates Japanese lexical complexity prediction and complex word identification, revealing challenges in personalization and the limited impact of model adaptation, with implications for language processing tasks involving diverse populations.
Contribution
It verifies dataset representativeness, explores personalization effects, and assesses model adaptation challenges in Japanese lexical complexity tasks.
Findings
Group mean models perform similarly to individual models in CWI.
Personalized LCP models are difficult to develop.
Finetuned BERT adaptation yields marginal improvements.
Abstract
The tasks of lexical complexity prediction (LCP) and complex word identification (CWI) commonly presuppose that difficult to understand words are shared by the target population. Meanwhile, personalization methods have also been proposed to adapt models to individual needs. We verify that a recent Japanese LCP dataset is representative of its target population by partially replicating the annotation. By another reannotation we show that native Chinese speakers perceive the complexity differently due to Sino-Japanese vocabulary. To explore the possibilities of personalization, we compare competitive baselines trained on the group mean ratings and individual ratings in terms of performance for an individual. We show that the model trained on a group mean performs similarly to an individual model in the CWI task, while achieving good LCP performance for an individual is difficult. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Linear Layer · Dropout · Layer Normalization · Residual Connection · Linear Warmup With Linear Decay · Dense Connections · Weight Decay
