Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems
Jialiang Xu, Mengyu Zhou, Xinyi He, Shi Han, Dongmei Zhang

TL;DR
This paper diagnoses the numerical capabilities of NLP systems in numerical question answering, revealing significant vulnerabilities to dataset perturbations and exploring data augmentation as a potential remedy.
Contribution
It systematically evaluates numerical capabilities in QA systems, introduces dataset perturbations for diagnosis, and investigates data augmentation to improve robustness.
Findings
Existing systems are highly sensitive to dataset perturbations.
Perturbations cause significant drops in accuracy, e.g., Graph2Tree drops 53.83%.
Data augmentation can partially mitigate robustness issues.
Abstract
Numerical Question Answering is the task of answering questions that require numerical capabilities. Previous works introduce general adversarial attacks to Numerical Question Answering, while not systematically exploring numerical capabilities specific to the topic. In this paper, we propose to conduct numerical capability diagnosis on a series of Numerical Question Answering systems and datasets. A series of numerical capabilities are highlighted, and corresponding dataset perturbations are designed. Empirical results indicate that existing systems are severely challenged by these perturbations. E.g., Graph2Tree experienced a 53.83% absolute accuracy drop against the ``Extra'' perturbation on ASDiv-a, and BART experienced 13.80% accuracy drop against the ``Language'' perturbation on the numerical subset of DROP. As a counteracting approach, we also investigate the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Adam · Byte Pair Encoding · Residual Connection · Dropout · Dense Connections
