Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length
Jingxuan Chen, Mohammad Taher Pilehvar, Jose Camacho-Collados

TL;DR
This paper investigates how Large Language Models' performance declines when processing multiple instances, highlighting that instance count impacts results more than context length, especially at larger scales.
Contribution
It provides a comprehensive evaluation of multi-instance processing in LLMs, revealing the dominant effect of instance count on performance degradation.
Findings
Performance slightly degrades with 20-100 instances
Performance collapses at larger instance counts
Instance count has a stronger effect than context length
Abstract
Users often rely on Large Language Models (LLMs) for processing multiple documents or performing analysis over a number of instances. For example, analysing the overall sentiment of a number of movie reviews requires an LLM to process the sentiment of each review individually in order to provide a final aggregated answer. While LLM performance on such individual tasks is generally high, there has been little research on how LLMs perform when dealing with multi-instance inputs. In this paper, we perform a comprehensive evaluation of the multi-instance processing (MIP) ability of LLMs for tasks in which they excel individually. The results show that all LLMs follow a pattern of slight performance degradation for small numbers of instances (approximately 20-100), followed by a performance collapse on larger instance counts. Crucially, our analysis shows that while context length is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
