New Methods & Metrics for LFQA tasks
Suchismit Mahapatra, Vladimir Blagojevic, Pablo Bertorello, Prasanna, Kumar

TL;DR
This paper introduces new methods and metrics for LFQA tasks, addressing dataset overlap, lack of automatic evaluation metrics, and ungrounded answers, to improve the reliability and progress of long-form question answering systems.
Contribution
It proposes novel NLI/NLG methods and metrics specifically designed to tackle key challenges in LFQA, such as dataset overlap and answer grounding.
Findings
Reduced dataset overlap issues
Introduced automatic evaluation metrics for LFQA
Enhanced grounding of answers in retrieved documents
Abstract
Long-form question answering (LFQA) tasks require retrieving the documents pertinent to a query, using them to form a paragraph-length answer. Despite considerable progress in LFQA modeling, fundamental issues impede its progress: i) train/validation/test dataset overlap, ii) absence of automatic metrics and iii) generated answers not being "grounded" in retrieved documents. This work addresses every one these critical bottlenecks, contributing natural language inference/generation (NLI/NLG) methods and metrics that make significant strides to their alleviation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
