Time Will Tell: Timing Side Channels via Output Token Count in Large   Language Models

Tianchen Zhang; Gururaj Saileshwar; David Lie

arXiv:2412.15431·cs.LG·December 23, 2024

Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models

Tianchen Zhang, Gururaj Saileshwar, David Lie

PDF

Open Access

TL;DR

This paper uncovers a timing side-channel in large language models that leaks sensitive input information through output token count, demonstrating practical attacks on translation and classification tasks and proposing mitigations.

Contribution

It introduces a novel timing side-channel based on output token count in LLMs and demonstrates its effectiveness in extracting sensitive input data.

Findings

01

Over 75% accuracy in identifying target language in translation tasks.

02

More than 70% accuracy in leaking input class in classification tasks.

03

Effective mitigations are proposed against the token count side-channel.

Abstract

This paper demonstrates a new side-channel that enables an adversary to extract sensitive information about inference inputs in large language models (LLMs) based on the number of output tokens in the LLM response. We construct attacks using this side-channel in two common LLM tasks: recovering the target language in machine translation tasks and recovering the output class in classification tasks. In addition, due to the auto-regressive generation mechanism in LLMs, an adversary can recover the output token count reliably using a timing channel, even over the network against a popular closed-source commercial LLM. Our experiments show that an adversary can learn the output language in translation tasks with more than 75% precision across three different models (Tower, M2M100, MBart50). Using this side-channel, we also show the input class in text classification tasks can be leaked out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling