Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Aaron Baughman, Rahul Agarwal, Eduardo Morales, Gozde Akay

TL;DR
This paper presents a novel meta-prompting approach using agentic reinforcement learning to align AI-generated content with human mental expectations, demonstrated through live sports event content creation.
Contribution
It introduces a method for optimizing LLMs to anticipate and incorporate human edits, improving content alignment with human mental models in real-time settings.
Findings
Achieved 53.8% alignment with human content reviewers.
Increased content quality by extending tennis action coverage.
Deployed successfully at US Open 2024 and other live events.
Abstract
We introduce a method of meta-prompting that jointly produces fluent text for complex tasks while optimizing the similarity of neural states between a human's mental expectation and a Large Language Model's (LLM) neural processing. A technique of agentic reinforcement learning is applied, in which an LLM as a Judge (LLMaaJ) teaches another LLM, through in-context learning, how to produce content by interpreting the intended and unintended generated text traits. To measure human mental beliefs around content production, users modify long form AI-generated text articles before publication at the US Open 2024 tennis Grand Slam. Now, an LLMaaJ can solve the Theory of Mind (ToM) alignment problem by anticipating and including human edits within the creation of text from an LLM. Throughout experimentation and by interpreting the results of a live production system, the expectations of human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
