Long Context, Less Focus: A Scaling Gap in LLMs Revealed through Privacy and Personalization
Shangding Gu

TL;DR
This paper introduces PAPerBench, a large-scale benchmark that systematically evaluates how increasing context length in LLMs affects personalization quality and privacy risks, revealing a consistent degradation in both as context grows.
Contribution
The paper presents a new benchmark and comprehensive analysis demonstrating a scaling gap in LLMs, where longer contexts lead to reduced personalization and increased privacy leakage, supported by theoretical insights.
Findings
Performance degrades in personalization with longer contexts
Privacy risks increase as context length grows
Attention dilution explains the scaling gap in LLMs
Abstract
Large language models (LLMs) are increasingly deployed in privacy-critical and personalization-oriented scenarios, yet the role of context length in shaping privacy leakage and personalization effectiveness remains largely unexplored. We introduce a large-scale benchmark, PAPerBench, to systematically study how increasing context length influences both personalization quality and privacy protection in LLMs. The benchmark comprises approximately 29,000 instances with context lengths ranging from 1K to 256K tokens, yielding a total of 377K evaluation questions. It jointly evaluates personalization performance and privacy risks across diverse scenarios, enabling controlled analysis of long-context model behavior. Extensive evaluations across state-of-the-art LLMs reveal consistent performance degradation in both personalization and privacy as context length increases. We further provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Adversarial Robustness in Machine Learning
