LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface
Michael Hind, Basel Shbita, Bo Wu, Farhan Ahmed, Chad DeLuca, Nathan Fulton, David Cox, Dan Gutfreund

TL;DR
This paper introduces LLMON, a markup language designed to encode structure and semantics in prompts, improving LLM understanding, safety, and security during training and inference.
Contribution
The paper presents the design and implementation of LLMON, an LLM-native markup language that enhances communication of structure and semantics to LLMs.
Findings
Preliminary evidence shows LLMON improves model accuracy.
LLMON enhances safety and security in LLM applications.
The approach enables better training and inference strategies.
Abstract
Textual Large Language Models (LLMs) provide a simple and familiar interface: a string of text is used for both input and output. However, the information conveyed to an LLM often has a richer structure and semantics, which is not conveyed in a string. For example, most prompts contain both instructions ("Summarize this paper into a paragraph") and data (the paper to summarize), but these are usually not distinguished when passed to the model. This can lead to model confusion and security risks, such as prompt injection attacks. This work addresses this shortcoming by introducing an LLM-native mark-up language, LLMON (LLM Object Notation, pronounced "Lemon"), that enables the structure and semantic metadata of the text to be communicated in a natural way to an LLM. This information can then be used during model training, model prompting, and inference implementation, leading to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
