Loading paper
Measure what Matters: Psychometric Evaluation of AI with Situational Judgment Tests | Tomesphere