Too Big to Fail: Larger Language Models are Disproportionately Resilient   to Induction of Dementia-Related Linguistic Anomalies

Changye Li; Zhecheng Sheng; Trevor Cohen; Serguei Pakhomov

arXiv:2406.02830·cs.CL·June 6, 2024

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

Changye Li, Zhecheng Sheng, Trevor Cohen, Serguei Pakhomov

PDF

Open Access 1 Repo 1 Video

TL;DR

This study investigates how larger language models demonstrate greater resilience to induced linguistic anomalies related to dementia, suggesting their potential to model neurodegenerative processes through attention mechanisms.

Contribution

It introduces a novel bidirectional attention head ablation method that parallels human cognitive reserve, revealing size-dependent resilience in transformer models.

Findings

01

Larger GPT-2 models need more attention heads masked to degrade performance.

02

Attention mechanisms may serve as an analogue to human cognitive reserve.

03

Resilience varies with model size, indicating potential for modeling neurodegeneration.

Abstract

As artificial neural networks grow in complexity, understanding their inner workings becomes increasingly challenging, which is particularly important in healthcare applications. The intrinsic evaluation metrics of autoregressive neural language models (NLMs), perplexity (PPL), can reflect how "surprised" an NLM model is at novel input. PPL has been widely used to understand the behavior of NLMs. Previous findings show that changes in PPL when masking attention layers in pre-trained transformer-based NLMs reflect linguistic anomalies associated with Alzheimer's disease dementia. Building upon this, we explore a novel bidirectional attention head ablation method that exhibits properties attributed to the concepts of cognitive and brain reserve in human brain studies, which postulate that people with more neurons in the brain and more efficient processing are more resilient to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linguisticanomalies/artificial-neural-reserve
pytorchOfficial

Videos

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies· underline

Taxonomy

TopicsTopic Modeling · Interpreting and Communication in Healthcare

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Layer Normalization · Weight Decay · Linear Warmup With Cosine Annealing · Attention Dropout · Linear Layer · Byte Pair Encoding · Adam