Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems

Boxiang Zhao; Qince Li; Zhonghao Wang; Zelin Cao; Yi Wang; Peng Cheng; Bo Lin

arXiv:2601.19923·cs.CL·May 18, 2026

Structure-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation for Web Information Systems

Boxiang Zhao, Qince Li, Zhonghao Wang, Zelin Cao, Yi Wang, Peng Cheng, Bo Lin

PDF

TL;DR

Structure-BiEval introduces a self-supervised framework that accurately evaluates the structural fidelity of LLM-generated Web data by decoupling structure from content, addressing limitations of traditional metrics.

Contribution

It proposes a novel, annotation-free evaluation method using deterministic intermediate representations and benchmarks 15 LLMs on Web structural data.

Findings

01

Significant variability in structural performance among LLMs

02

Mid-sized models can outperform larger models in Web data formatting

03

Deep recursive nesting challenges persist across model scales

Abstract

As Large Language Models (LLMs) evolve into the core of Web-based autonomous agents and complex Web Information Systems, their ability to faithfully translate natural language into rigorous structured formats has become paramount, as this capability is critical for Web API invocation and data exchange. However, evaluating this structural fidelity in Web-native payloads remains a challenge: traditional text metrics fail to capture topological consistency in semi-structured Web data, while manual evaluation is prohibitively costly. To address this, we propose Structure-BiEval, a novel self-supervised framework for quantitative, annotation-free assessment tailored for Web data engineering. By leveraging deterministic Intermediate Representations, our framework effectively decouples structure from content, utilizing Content Semantic Accuracy and Normalized Tree Edit Distance as precise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification