TL;DR
This paper introduces a zero-shot benchmark to evaluate whether large language models can address the philosophical Frame and Symbol Grounding Problems, revealing that some models demonstrate promising capacities in these complex cognitive tasks.
Contribution
It develops novel benchmark tasks for the Frame and Symbol Grounding Problems and assesses 13 LLMs, providing insights into their abilities to handle these philosophical challenges.
Findings
Open-source models show performance variability based on size and tuning
Several closed models consistently perform well on the benchmarks
Results suggest some LLMs may address longstanding philosophical problems
Abstract
Recent advancements in large language models (LLMs) have revitalized philosophical debates surrounding artificial intelligence. Two of the most fundamental challenges - namely, the Frame Problem and the Symbol Grounding Problem - have historically been viewed as unsolvable within traditional symbolic AI systems. This study investigates whether modern LLMs possess the cognitive capacities required to address these problems. To do so, I designed two benchmark tasks reflecting the philosophical core of each problem, administered them under zero-shot conditions to 13 prominent LLMs (both closed and open-source), and assessed the quality of the models' outputs across five trials each. Responses were scored along multiple criteria, including contextual reasoning, semantic coherence, and information filtering. The results demonstrate that while open-source models showed variability in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
