Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang; Seohyeon Cha; Hasan Burhan Beytur; Kevin S Chan; Gustavo de Veciana; Haris Vikalo

arXiv:2603.04247·cs.LG·March 5, 2026

Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

Haoran Zhang, Seohyeon Cha, Hasan Burhan Beytur, Kevin S Chan, Gustavo de Veciana, Haris Vikalo

PDF

Open Access

TL;DR

This paper introduces a variance-reduced online learning algorithm for multi-layer hierarchical inference systems with partial, policy-dependent feedback, improving stability and performance in resource-constrained environments.

Contribution

It develops a novel variance-reduction technique integrated with Lyapunov optimization for stable online routing under sparse feedback and resource constraints.

Findings

01

The proposed method achieves lower regret compared to standard importance-weighted algorithms.

02

It maintains stability and unbiased loss estimation despite decaying feedback probabilities.

03

Experimental results show improved performance on large-scale multi-task workloads.

Abstract

Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization