Loading paper
$\pi$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows | Tomesphere