Are Training Resources Insufficient? Predict First Then Explain!
Myeongjun Jang, Thomas Lukasiewicz

TL;DR
This paper compares explain-then-predict and predict-then-explain architectures, demonstrating that PtE is more data-efficient and training-efficient, especially when explanation data are limited, supported by theoretical analysis and experiments.
Contribution
The paper introduces the predict-then-explain (PtE) architecture as a more data- and training-efficient alternative to the traditional explain-then-predict (EtP) structure.
Findings
PtE is more data-efficient with limited explanation data
PtE consistently requires less training time than EtP
Experimental results confirm theoretical advantages of PtE
Abstract
Natural language free-text explanation generation is an efficient approach to train explainable language processing models for commonsense-knowledge-requiring tasks. The most predominant form of these models is the explain-then-predict (EtP) structure, which first generates explanations and uses them for making decisions. The performance of EtP models is highly dependent on that of the explainer by the nature of their structure. Therefore, large-sized explanation data are required to train a good explainer model. However, annotating explanations is expensive. Also, recent works reveal that free-text explanations might not convey sufficient information for decision making. These facts cast doubts on the effectiveness of EtP models. In this paper, we argue that the predict-then-explain (PtE) architecture is a more efficient approach in terms of the modelling perspective. Our main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
