CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba; Qinbin Li; Songze Li

arXiv:2602.19547·cs.CR·February 24, 2026

CIBER: A Comprehensive Benchmark for Security Evaluation of Code Interpreter Agents

Lei Ba, Qinbin Li, Songze Li

PDF

Open Access

TL;DR

CIBER is a comprehensive benchmark designed to evaluate the security vulnerabilities of code interpreter agents against various adversarial attacks, revealing insights into model robustness and security gaps.

Contribution

This paper introduces CIBER, a novel automated benchmark that assesses security risks of code interpreter agents through dynamic attack scenarios and state-aware evaluation.

Findings

01

Structural integration improves security performance.

02

High intelligence increases vulnerability to complex prompts.

03

Natural language disguises are more effective than code snippets.

Abstract

LLM-based code interpreter agents are increasingly deployed in critical workflows, yet their robustness against risks introduced by their code execution capabilities remains underexplored. Existing benchmarks are limited to static datasets or simulated environments, failing to capture the security risks arising from dynamic code execution, tool interactions, and multi-turn context. To bridge this gap, we introduce CIBER, an automated benchmark that combines dynamic attack generation, isolated secure sandboxing, and state-aware evaluation to systematically assess the vulnerability of code interpreter agents against four major types of adversarial attacks: Direct/Indirect Prompt Injection, Memory Poisoning, and Prompt-based Backdoor. We evaluate six foundation models across two representative code interpreter agents (OpenInterpreter and OpenCodeInterpreter), incorporating a controlled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques