Addressing the Ecological Fallacy in Larger LMs with Human Context

Nikita Soni; Dhruv Vijay Kunjadiya; Pratham Piyush Shah; Dikshya Mohanty; H. Andrew Schwartz; Niranjan Balasubramanian

arXiv:2603.05928·cs.CL·March 9, 2026

Addressing the Ecological Fallacy in Larger LMs with Human Context

Nikita Soni, Dhruv Vijay Kunjadiya, Pratham Piyush Shah, Dikshya Mohanty, H. Andrew Schwartz, Niranjan Balasubramanian

PDF

Open Access

TL;DR

This paper demonstrates that modeling an author's language context during fine-tuning and pre-training enhances the performance of large language models across multiple tasks by addressing the ecological fallacy.

Contribution

It introduces the HuLM and HuFT methods for incorporating author context into large language models, showing improvements over standard fine-tuning and pre-training.

Findings

01

Fine-tuning with author context improves model performance.

02

Pre-training with HuLM enhances generalization across tasks.

03

Modeling author context is crucial for better language understanding.

Abstract

Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing this form of \textit{ecological fallacy} can greatly improve the performance of multiple smaller (~124M) GPT-based models. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger-scale model, an 8B Llama model. To this end, we explore variants that process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context (\textit{HuFT:Human-aware Fine-Tuning}). Empirical comparisons show that addressing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Natural Language Processing Techniques