MANTA: Multi-turn Assessment for Nonhuman Thinking & Alignment

Allen Lu; Isabella Luong; Joyee Chen

arXiv:2605.16301·cs.CY·May 19, 2026

MANTA: Multi-turn Assessment for Nonhuman Thinking & Alignment

Allen Lu, Isabella Luong, Joyee Chen

PDF

1 Repo

TL;DR

MANTA is a dynamic multi-turn evaluation framework that stress-tests large language models' animal welfare alignment across realistic scenarios using adversarial follow-up questions.

Contribution

It introduces a novel multi-turn assessment method that dynamically generates pressure turns, revealing nuanced model behaviors and weaknesses in welfare reasoning.

Findings

01

Turn 1 welfare framing is reliable; Turn 2 introduces variance.

02

Evidence-based capacity attribution is the weakest dimension.

03

AI governance scenarios elicit stronger welfare reasoning.

Abstract

Single-turn benchmarks such as AnimalHarmBench (AHB) have established important baselines for measuring animal welfare alignment in large language models (LLMs), but they miss a critical failure mode: models that respond appropriately when unpressured may capitulate when follow-up conversational turns introduce economic, social, or authority-based arguments. We introduce MANTA (Multi-turn Assessment for Nonhuman Thinking and Alignment), a dynamic multi-turn evaluation framework built on the Inspect AI platform that stress-tests frontier LLMs across realistic professional and everyday scenarios using adversarially generated follow-up questions. Unlike static benchmarks, MANTA generates pressure turns dynamically from each model's actual responses, producing targeted and realistic adversarial pressure. The framework evaluates models across up to 13 AHB-derived scoring dimensions on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Mycelium-tools/manta
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.