TL;DR
SecureRouter introduces an encrypted routing framework that adaptively selects model sizes for secure transformer inference, significantly reducing latency while maintaining accuracy.
Contribution
It presents a novel input-adaptive encrypted routing system with a secure router and MPC-optimized model pool, enhancing efficiency in secure AI inference.
Findings
Achieves 1.95x latency reduction over prior methods
Maintains negligible accuracy loss with adaptive model selection
Provides an open-source implementation for practical deployment
Abstract
Cryptographically secure neural network inference typically relies on secure computing techniques such as Secure Multi-Party Computation (MPC), enabling cloud servers to process client inputs without decrypting them. Although prior privacy-preserving inference systems co-design network optimizations with MPC, they remain slow and costly, limiting real-world deployment. A major bottleneck is their use of a single, fixed transformer model for all encrypted inputs, ignoring that different inputs require different model sizes to balance efficiency and accuracy. We present SecureRouter, an end-to-end encrypted routing and inference framework that accelerates secure transformer inference through input-adaptive model selection under encryption. SecureRouter establishes a unified encrypted pipeline that integrates a secure router with an MPC-optimized model pool, enabling coordinated routing,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
