Loading paper
$\tau^2$-Bench: Evaluating Conversational Agents in a Dual-Control Environment | Tomesphere