Are LLMs Aware that Some Questions are not Open-ended?
Dongjie Yang, Hai Zhao

TL;DR
This paper investigates whether Large Language Models (LLMs) recognize the nature of questions they face and proposes a method to improve their question awareness, reducing hallucinations and enhancing response quality.
Contribution
The paper introduces Question Awareness Temperature Sampling (QuATS), a novel technique that adaptively adjusts output distributions to improve LLMs' question awareness without manual tuning.
Findings
LLMs often lack awareness of question types, leading to hallucinations.
QuATS improves LLM performance across multiple benchmarks.
Automatic temperature adjustment enhances response appropriateness.
Abstract
Large Language Models (LLMs) have shown the impressive capability of answering questions in a wide range of scenarios. However, when LLMs face different types of questions, it is worth exploring whether LLMs are aware that some questions have limited answers and need to respond more deterministically but some do not. We refer to this as question awareness of LLMs. The lack of question awareness in LLMs leads to two phenomena that LLMs are: (1) too casual to answer non-open-ended questions or (2) too boring to answer open-ended questions. In this paper, we first evaluate the question awareness in LLMs. The experimental results show that LLMs have the issues of lacking awareness of questions in certain domains, e.g. factual knowledge, resulting in hallucinations during the generation. To mitigate these, we propose a method called Question Awareness Temperature Sampling (QuATS). This…
Peer Reviews
Decision·Submitted to ICLR 2024
1. Novel Focus: The paper tackles the "question awareness" in LLMs, exploring their ability to discern between open-ended and non-open-ended questions. 2. Methodological Contribution: The introduction of the Question Awareness Temperature (QAT) sampling method is novel.
1. The experimental setting is not clear. 2. Some benchmarks are missing for evaluation. 3. Some important implementation details are missing.
- This paper calls attention to the ability of question awareness, which is useful in detecting when “hallucination” or creativity is needed, and when factuality is more important when answering user queries. - This paper use average Kurtosis as a metric to evaluate the “determinacy” of the answer. I am not fully convinced of whether this metric is reliable and whether it truly reflects determinacy, but it is an interesting choice. - The paper introduced QAT, an adaptive way to tune the temperat
- Better definition of question awareness/determinacy: Why does average kurtosis reflect determinacy? Is determinacy equivalent to “certainty” of the model? How do these two concepts link together? I am not convinced by the claim in section 2.4 that LLMs have fundamental question awareness on some scenarios, maybe these tasks are indirectly or directly presented in their training data, and thus not necessarily mean that they know that they need to choose more deterministically. - QAT has a phase
The paper has the following strengths: 1. QAT is a fairly simple procedure that could have benefits beyond just open-endedness of questions posed to the chatbot. 2. QAT shows improvements over non-QAT in a variety of LLMs (Table 1)
The paper has several weaknesses. 1. The labelling of open-endedness itself comes from GPT-4, thus invalidating one of the core propositions that LLMs are not very good at identifiying open-ended questions. 2. The prompt used to GPT classifies is simplistic and would result in classification based on the overall topic and appearance rather than factual open-endedness. As an example, consider a question that is open-ended by seemingly about science (eg. concerned with origin of life or question
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Digital Rights Management and Security · Artificial Intelligence in Law
MethodsAttentive Walk-Aggregating Graph Neural Network
