TL;DR
Nautilus Compass is a black-box, prompt-text-based persona drift detection system for production LLM agents, operating without model weights and providing tamper-evident audit logs.
Contribution
It introduces a novel black-box persona drift detection method that works solely at the prompt-text layer, applicable to closed API LLMs, with a comprehensive system implementation.
Findings
Achieves ROC AUC 0.83 for drift detection on real session traces.
Outperforms baseline retrieval pipelines on LongMemEval-S and EverMemBench-Dynamic.
Reproduction cost is approximately $3.50, significantly cheaper than some existing systems.
Abstract
Production LLM coding agents drift over long sessions: they forget user-specified constraints, slip into mistakes the user already flagged, and confabulate prior agreements. White-box approaches such as persona vectors require model weights and so cannot be applied to closed APIs (Claude, GPT-4) that most users actually interact with. We present Nautilus Compass, a black-box persona drift detector and agent memory layer for production coding agents. The method operates entirely at the prompt-text layer: cosine similarity between user prompts and behavioral anchor texts, aggregated by a weighted top-k mean using BGE-m3 embeddings. Compass is, to our knowledge, the only public agent memory layer (among Mem0, Letta, Cognee, Zep, MemOS, smrti verified May 2026) that does not call an LLM at index time to extract facts or build a graph; raw conversation text is embedded directly. The system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
