Adversarial Circuit Evaluation

Niels uit de Bos; Adri\`a Garriga-Alonso

arXiv:2407.15166·cs.LG·July 23, 2024

Adversarial Circuit Evaluation

Niels uit de Bos, Adri\`a Garriga-Alonso

PDF

Open Access

TL;DR

This paper evaluates the fidelity of three neural network circuits by adversarially testing their divergence from the full model, revealing significant discrepancies even on benign inputs, thus highlighting the need for more robust circuit designs.

Contribution

It introduces an adversarial evaluation method for neural network circuits, demonstrating that existing circuits often fail to accurately represent the full model's behavior.

Findings

01

IOI and docstring circuits diverge significantly from the full model on benign inputs.

02

Current circuits are insufficient for safety-critical applications.

03

Adversarial testing reveals vulnerabilities in circuit representations.

Abstract

Circuits are supposed to accurately describe how a neural network performs a specific task, but do they really? We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner, considering inputs where the circuit's behavior maximally diverges from the full model. Concretely, we measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs. Our results show that the circuits for the IOI and docstring tasks fail to behave similarly to the full model even on completely benign inputs from the original task, indicating that more robust circuits are needed for safety-critical applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntegrated Circuits and Semiconductor Failure Analysis · Electrostatic Discharge in Electronics · VLSI and Analog Circuit Testing