Entropy-Based Measurement of Value Drift and Alignment Work in Large Language Models
Samih Fadli

TL;DR
This paper introduces an entropy-based framework to measure and monitor value drift and alignment in large language models, enabling dynamic safety assessment during deployment.
Contribution
It operationalizes the Second Law of Intelligence for LLMs by defining ethical entropy, training a classifier, and developing a monitoring pipeline for real-time oversight.
Findings
Tuned models significantly reduce ethical entropy growth.
Entropy monitoring can detect value drift in real-time.
Alignment work correlates with entropy suppression.
Abstract
Large language model safety is usually assessed with static benchmarks, but key failures are dynamic: value drift under distribution shift, jailbreak attacks, and slow degradation of alignment in deployment. Building on a recent Second Law of Intelligence that treats ethical entropy as a state variable which tends to increase unless countered by alignment work, we make this framework operational for large language models. We define a five-way behavioral taxonomy, train a classifier to estimate ethical entropy S(t) from model transcripts, and measure entropy dynamics for base and instruction-tuned variants of four frontier models across stress tests. Base models show sustained entropy growth, while tuned variants suppress drift and reduce ethical entropy by roughly eighty percent. From these trajectories we estimate an effective alignment work rate gamma_eff and embed S(t) and gamma_eff…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing
