Missing vs. Unused Knowledge Hypothesis for Language Model Bottlenecks in Patent Understanding
Siyang Wu, Honglin Bao, Nadav Kunievsky, James A. Evans

TL;DR
This paper investigates the reasons behind language models' struggles in patent understanding, revealing that most errors are due to failure to utilize existing knowledge rather than lacking it, and proposes a framework to diagnose these issues.
Contribution
Introduces a framework to decompose model errors into missing and unused knowledge, and analyzes how model size affects question complexity and knowledge deployment.
Findings
Most errors stem from failure to deploy existing knowledge.
Smaller models generate simpler, more effective questions.
Larger models produce complex questions less effectively.
Abstract
While large language models (LLMs) excel at factual recall, the real challenge lies in knowledge application. A gap persists between their ability to answer complex questions and their effectiveness in performing tasks that require that knowledge. We investigate this gap using a patent classification problem that requires deep conceptual understanding to distinguish semantically similar but objectively different patents written in dense, strategic technical language. We find that LLMs often struggle with this distinction. To diagnose the source of these failures, we introduce a framework that decomposes model errors into two categories: missing knowledge and unused knowledge. Our method prompts models to generate clarifying questions and compares three settings -- raw performance, self-answered questions that activate internal knowledge, and externally provided answers that supply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Law, AI, and Intellectual Property · Open Source Software Innovations
