SceneCritic: A Symbolic Evaluator for 3D Indoor Scene Synthesis
Kathakoli Sengupta, Kai Ao, Paola Cascante-Bonilla

TL;DR
SceneCritic is a symbolic evaluator for indoor scene layouts that verifies spatial coherence using a structured ontology, outperforming traditional VLM-based judges in assessing 3D scene synthesis quality.
Contribution
It introduces SceneCritic, a structured, ontology-based evaluator for indoor scenes, and demonstrates its effectiveness over existing VLM and LLM-based evaluation methods.
Findings
SceneCritic aligns better with human judgments than VLM-based evaluators.
Text-only LLMs can outperform VLMs on semantic layout quality.
Image-based VLM refinement effectively improves semantic and orientation accuracy.
Abstract
Large Language Models (LLMs) and Vision-Language Models (VLMs) increasingly generate indoor scenes through intermediate structures such as layouts and scene graphs, yet evaluation still relies on LLM or VLM judges that score rendered views, making judgments sensitive to viewpoint, prompt phrasing, and hallucination. When the evaluator is unstable, it becomes difficult to determine whether a model has produced a spatially plausible scene or whether the output score reflects the choice of viewpoint, rendering, or prompt. We introduce SceneCritic, a symbolic evaluator for floor-plan-level layouts. SceneCritic's constraints are grounded in SceneOnto, a structured spatial ontology we construct by aggregating indoor scene priors from 3D-FRONT, ScanNet, and Visual Genome. SceneOnto traverses this ontology to jointly verify semantic, orientation, and geometric coherence across object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
