Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising
Amir Tavanaei, Kee Kiat Koo, Hayreddin Ceker, Shaobai Jiang, Qi Li,, Julien Han, Karim Bouyarmane

TL;DR
This paper introduces Structured Object Language Modeling (SoLM), a self-supervised denoising approach for generating complex, schema-conforming structured objects with high consistency and efficiency, outperforming prompt-engineered models.
Contribution
The paper presents a novel self-supervised training method for native structured object generation using LLMs, eliminating the need for prompt engineering and improving cost-efficiency.
Findings
Outperforms prompt-engineered LLMs like Claude 3 and Mixtral-8x7B
Provides a cost-effective approach with strong baseline results
Enhances structured object generation with self-supervised denoising
Abstract
In this paper, we study the problem of generating structured objects that conform to a complex schema, with intricate dependencies between the different components (facets) of the object. The facets of the object (attributes, fields, columns, properties) can be a mix of short, structured, type-constrained facts, or long natural-language descriptions. The object has to be self-consistent between the different facets in the redundant information it carries (relative consistency), while being grounded with respect to world knowledge (absolute consistency). We frame the problem as a Language Modeling problem (Structured Object Language Modeling) and train an LLM to perform the task natively, without requiring instructions or prompt-engineering. We propose a self-supervised denoising method to train the model from an existing dataset of such objects. The input query can be the existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
