Vector Representations of Idioms in Conversational Systems
Tosin Adewumi, Foteini Liwicki, Marcus Liwicki

TL;DR
This paper shows that training conversational AI on idioms improves response relevance to idiomatic prompts, achieving state-of-the-art classification and better conversational responses.
Contribution
It introduces a method for enhancing conversational systems with idiom understanding using the PIE-English corpus and demonstrates improved performance over baseline models.
Findings
98% macro F1 score on idiom classification with T5
71.9% of responses are more fitting when trained on idioms
Model and code are publicly available on HuggingFace
Abstract
We demonstrate, in this study, that an open-domain conversational system trained on idioms or figurative language generates more fitting responses to prompts containing idioms. Idioms are part of everyday speech in many languages, across many cultures, but they pose a great challenge for many Natural Language Processing (NLP) systems that involve tasks such as Information Retrieval (IR) and Machine Translation (MT), besides conversational AI. We utilize the Potential Idiomatic Expression (PIE)-English idioms corpus for the two tasks that we investigate: classification and conversation generation. We achieve state-of-the-art (SoTA) result of 98% macro F1 score on the classification task by using the SoTA T5 model. We experiment with three instances of the SoTA dialogue model, Dialogue Generative Pre-trained Transformer (DialoGPT), for conversation generation. Their performances are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Attention Dropout · Gated Linear Unit · Adafactor
