Idiomify -- Building a Collocation-supplemented Reverse Dictionary of English Idioms with Word2Vec for non-native learners
Eu-Bin Kim

TL;DR
Idiomify develops a reverse dictionary of English idioms for non-native learners using collocation models and Word2Vec, enhancing idiom exploration and usage guidance through innovative mining and semantic techniques.
Contribution
The paper introduces a novel approach combining collocation metrics and Word2Vec for building a flexible reverse idiom dictionary tailored for non-native speakers.
Findings
Word2Vec effectively models idiom semantics.
Collocation techniques improve idiom identification.
Hybrid methods can enhance reverse dictionary performance.
Abstract
The aim of idiomify is to build a collocation-supplemented reverse dictionary of idioms for the non-native learners of English. We aim to do so because the reverse dictionary could help the non-natives explore idioms on demand, and the collocations could also guide them on using idioms more adequately. The cornerstone of the project is a reliable way of mining idioms from corpora, which is however a challenge because idioms extensively vary in forms. We tackle this by automatically deriving matching rules from their base forms. We use Point-wise Mutual Inclusion (PMI), Term Frequency - Inverse Document Frequency (TF-IDF) to model collocations, since both of them are popular metric for pairwise significance. We also try Term Frequency (TF) as the baseline model. As for implementing the reverse-dictionary, three approaches could be taken: inverted index, graphs and distributional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Second Language Acquisition and Learning
MethodsBalanced Selection
