Learning to Order Sub-questions for Complex Question Answering
Yunan Zhang, Xiang Cheng, Yufeng Zhang, Zihan Wang, Zhengqi Fang,, Xiaoyan Wang, Zhenya Huang, Chengxiang Zhai

TL;DR
This paper introduces a reinforcement learning method to optimize the order of sub-questions in complex question answering, significantly improving accuracy by balancing risk and utility during reasoning.
Contribution
It presents a novel RL-based approach to dynamically determine the optimal sequence of sub-questions, enhancing complex question answering performance.
Findings
Improved accuracy in complex question answering tasks.
RL-based ordering outperforms arbitrary ordering strategies.
Method is general and compatible with existing QA systems.
Abstract
Answering complex questions involving multiple entities and relations is a challenging task. Logically, the answer to a complex question should be derived by decomposing the complex question into multiple simple sub-questions and then answering those sub-questions. Existing work has followed this strategy but has not attempted to optimize the order of how those sub-questions are answered. As a result, the sub-questions are answered in an arbitrary order, leading to larger search space and a higher risk of missing an answer. In this paper, we propose a novel reinforcement learning(RL) approach to answering complex questions that can learn a policy to dynamically decide which sub-question should be answered at each stage of reasoning. We lever-age the expected value-variance criterion to enable the learned policy to balance between the risk and utility of answering a sub-question.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
