Loading paper
The Model Agreed, But Didn't Learn: Diagnosing Surface Compliance in Large Language Models | Tomesphere