SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation

Shrikant Kendre; Austin Xu; Honglu Zhou; Michael Ryoo; Shafiq Joty; Juan Carlos Niebles

arXiv:2511.17432·cs.CL·November 24, 2025

SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation

Shrikant Kendre, Austin Xu, Honglu Zhou, Michael Ryoo, Shafiq Joty, Juan Carlos Niebles

PDF

Open Access

TL;DR

SMILE is a new evaluation metric for question-answering tasks that combines lexical and semantic analysis to better align with human judgment, addressing limitations of existing metrics.

Contribution

It introduces a composite metric that balances lexical exactness with semantic understanding, improving QA evaluation accuracy.

Findings

01

Highly correlated with human judgments across multiple QA tasks

02

Computationally lightweight compared to LLM-based evaluators

03

Effectively balances lexical and semantic evaluation aspects

Abstract

Traditional evaluation metrics for textual and visual question answering, like ROUGE, METEOR, and Exact Match (EM), focus heavily on n-gram based lexical similarity, often missing the deeper semantic understanding needed for accurate assessment. While measures like BERTScore and MoverScore leverage contextual embeddings to address this limitation, they lack flexibility in balancing sentence-level and keyword-level semantics and ignore lexical similarity, which remains important. Large Language Model (LLM) based evaluators, though powerful, come with drawbacks like high costs, bias, inconsistency, and hallucinations. To address these issues, we introduce SMILE: Semantic Metric Integrating Lexical Exactness, a novel approach that combines sentence-level semantic understanding with keyword-level semantic understanding and easy keyword matching. This composite method balances lexical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning