BIRD: Behavior Induction via Representation-structure Distillation

Galen Pogoncheff; Michael Beyeler

arXiv:2505.23933·cs.LG·June 2, 2025

BIRD: Behavior Induction via Representation-structure Distillation

Galen Pogoncheff, Michael Beyeler

PDF

Open Access 3 Reviews

TL;DR

BIRD is a novel framework that transfers human-aligned behaviors to models by matching internal representation structures, significantly improving out-of-distribution robustness and offering practical insights for model alignment.

Contribution

Introduces BIRD, a flexible representation-structure distillation method for transferring aligned behaviors, outperforming existing techniques in robustness and providing insights for teacher model selection.

Findings

01

BIRD improves robust accuracy by up to 16% over baselines.

02

Effective even when the teacher is simpler and smaller than the student.

03

Representation properties explain up to 85% of transfer success variance.

Abstract

Human-aligned deep learning models exhibit behaviors consistent with human values, such as robustness, fairness, and honesty. Transferring these behavioral properties to models trained on different tasks or data distributions remains challenging: aligned behavior is easily forgotten during fine-tuning, and collecting task-specific data that preserves this behavior can be prohibitively costly. We introduce BIRD (Behavior Induction via Representation-structure Distillation), a flexible framework for transferring aligned behavior by matching the internal representation structure of a student model to that of a teacher. Applied to out-of-distribution robustness in image classification, BIRD outperforms fine-tuning, transfer learning, and continual learning methods, improving robust accuracy by up to 16% over the next strongest baseline. It remains effective even when the teacher is trained…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 5

Strengths

1. I believe the main idea is quite solid, using the geometry of the representation is justified, and the authors show that it also yields practical utility, beyond the intuitive justification. The overall hypothesis has high potential impact. 2. Identifying predictive factors is a valuable contribution. The high explained variance (up to 85%) provides practitioners with a principled way to select teachers. 3. The paper is well written and clearly explained, which helps with reproducibility. 4.

Weaknesses

There is a lack of ablation studies on some key aspects, or at least some important metrics are not reported. Major: 1. The batch size B is a fundamental hyperparameter for CKA. This is not reported anywhere. The performance could be highly sensitive to this parameter. Minor: 2. The choice of kernel (e.g., linear vs. RBF) can drastically change the geometry being compared. No ablation or discussion is provided on why this specific similarity measure was chosen over others. 3. The paper does not

Reviewer 02Rating 4Confidence 1

Strengths

1. BIRD does not require the teacher and student to share an input space, output space, task, or architecture. 2. The paper successfully shows BIRD is not just a vision technique. Its application to DPO (safety) and soft-label distillation demonstrates its potential as a general tool.

Weaknesses

1. The method relies on selecting a single "guiding" and "guided" layer. This selection was based on a heuristic, and the authors acknowledge that exploring multi-layer extensions is a direction for future work. 2. The experimental setup is not representative of modern, real-world applications. The use of CIFAR-10, CIFAR-100, and TinyImageNet, with all images downsampled to $32 \times 32$ pixels, deemed a "toy problem" by 2026 standards. It is highly uncertain whether robustness features learned

Reviewer 03Rating 6Confidence 3

Strengths

Flexibility and Scalability: Unlike traditional knowledge distillation or continual learning, BIRD doesn't require shared tasks, data, or output spaces, allowing transfer from small/simple teachers (e.g., CIFAR-10-trained MobileNetV2) to 25× larger students on complex datasets like TinyImageNet. Principled Teacher Selection: The identification of three interpretable, computable properties (quantifying task and behavioral relevance in representations) makes the method actionable and predictable,

Weaknesses

Computational Overhead: Computing Gram matrices over batches adds overhead during training, potentially scaling poorly for very large models or high-dimensional representations. Layer Selection Sensitivity: Relies on choosing specific "guiding" and "guided" layers; while properties help, this introduces hyperparameters and may not generalize across all model families. Limited to Encoded Behaviors: Assumes behaviors are fully captured in representation structure (e.g., geometry via Gram matrice

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications