TL;DR
This paper presents a system combining feature engineering and BERT-based neural networks for lexical complexity prediction, achieving competitive results and providing insights into model interpretability.
Contribution
The novel integration of handcrafted lexical, semantic, syntactic, and phonological features with BERT enhances prediction accuracy, especially in difficult cases.
Findings
Feature engineering improves extreme case predictions.
BERT attention maps reveal learned features.
Ensembled models perform well on multiple subtasks.
Abstract
This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model with a deep neural network model founded on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonological measures. Visualizations of BERT attention maps offer insight into potential features that Transformers models may learn when fine-tuned for lexical complexity prediction. Our ensembled predictions score reasonably well for the single word subtask, and we demonstrate how they can be harnessed to perform well on the multi word expression subtask too.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Softmax · Linear Warmup With Linear Decay · WordPiece · Attention Dropout
