An entropic explanation of insistence on sameness in autism
Przemysław Śliwiński

TL;DR
This paper proposes a new theory explaining autism's insistence on sameness using information theory, suggesting it's a way to reduce uncertainty and surprise.
Contribution
A novel information theory framework defines autism as an impairment in cognitive functions and links insistence on sameness to entropy minimization.
Findings
Insistence on sameness is explained as a strategy to restrict stimuli to known patterns, minimizing entropy.
The framework allows quantifying psychological concepts like anxiety and comfort zone using entropy metrics.
Therapy guidelines and robotic caregiver programs can be derived from the framework without direct experimentation on autistic individuals.
Abstract
An information theory-based framework is proposed in attempt to explain insistence on sameness in autism as an instance of a general behavior pattern in which an individual tries to reduce surprise and uncertainty. It offers a new definition of autism as an impairment in which cognitive functions are restricted to discrimination, memorization and prediction of tangible properties of the environment. An analogy between insistence on sameness and constrained minimization of the entropy metric is observed and examined for a set of assumptions that describe cognitive limitations of a person with autism. The metric is given by the formula DH(R, M) = H(R|M)+H(M|R), where R represents sequences of random stimuli, M is a memory that stores and retrieves them, and where H(·|·) denotes their conditional entropies interpreted as surprise and uncertainty, respectively. It is first inferred that…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2|
|
|
|---|---|
|
| An accessible information theory textbook, where basic definitions of the |
|
| A review paper that discusses a hypothesis that the |
|
| A diagnostic manual that presents diagnostic criteria, verbal descriptions of behavior traits associated with autism (incl. |
|
| Introduction, analysis and early experimental verification of a link between |
|
| An attempt to create a computational model of autism perception based on mostly verbal description of the functional |
|
| Discussion of a possibly multifold (positive and negative) impact of noise on autism traits and the associated cognitive abilities. |
|
| Reviews Bayesian approaches to autism. Introduces a model based on assumption that underlies a significant impact of the sensory information on building and updating the internal representation of environment. |
|
| A study that links |
|
| A proposal of several remedies to the problem of multifold increase in autism diagnoses, incl. narrowing its definition and combining the experience of clinicians with qualitative features. |
|
| A comprehensive, extensive and detailed compendium describing autism spectrum disorder-related topics; in particular, |
|
| A survey presenting |
|
| A review paper which discusses in a formalized fashion the universality of |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutism Spectrum Disorder Research · Cognitive Science and Education Research · Neural and Behavioral Psychology Studies
Introduction
1
Etiology and pathogenesis of autism remain unclear as does its definition (Wiggins et al., 2009; American Psychiatric Association, DSM-5 Task Force, 2013; Davis and Plaisted-Grant, 2015; Sauer et al., 2021; Volkmar, 2021; Jourdon et al., 2023; Shiraishi et al., 2024; Assary et al., 2024; Mahé et al., 2025). There is no a consensus about the behavior traits that are associated with individuals with autism either (Mottron and Bzdok, 2020; Chapman and Veit, 2021; Perochon et al., 2023). While behavior, in general, appears to be driven by a variety of templates and patterns (see e.g., Lai, 2023 and cf. Bailey et al., 2024), we focus on insistence on sameness, a phenomenon referred also to as stereotyped, repetitive, and the autistic-like behavior or the autism-like trait (cf. e.g., Uljarević et al., 2017; Volkmar, 2021; Lyu et al., 2024; Kamp-Becker, 2024) observed in non-verbal low-functioning individuals (the term low-functioning is used to denote severe (profound) instances of the Level 3 of Autism Spectrum Disorder, cf. American Psychiatric Association, DSM-5 Task Force, 2013; Betty et al., 2015; Wang et al., 2022; Singer et al., 2023).
We propose a new insight into that phenomenon and examine its resemblance to a problem of constrained minimization of uncertainty, where the following entropy metric (see e.g., MacKay, 2003):
is used as a measure of uncertainty. Random variables, R and M, represent the real-world environment and the memory of an individual, respectively. The terms H(R|M) and H(M|R) are conditional entropies that quantify the levels of surprise and uncertainty.
The motivation for such an approach is to devise a mathematically rigorous framework that offers an explanation of the insistence on sameness which is more precise and less arbitrary than the now dominant language-based descriptions (cf. e.g., American Psychiatric Association, DSM-5 Task Force, 2013; Mottron and Bzdok, 2020; Chapman and Veit, 2021; Volkmar, 2021; Wang et al., 2022), enables formulation of learning therapies as solutions to a constrained optimization problem and facilitates their implementations as programs for e.g., robot-based daily life assistants.
The framework is inspired by a variety of results and observations from information theory, neuroscience and psychiatry (see Table 1) and its contribution is twofold, theoretical (a relatively simple and verifiable explanation of insistence on sameness as a consequence of cognitive restrictions of a person with autism) and practical (a set of design guidelines for therapies focused on low-functioning non-verbal individuals), and may lead to a more precise definition of the disorder and better personalized therapies.
Clearly, there are some limitations of the framework: it is a functional model, i.e., it does not define the underlying biological implementation. Is also requires new and carefully designed personalized validation experiments to be developed. In turn, limiting the scope of the framework to low-functioning non-verbal individuals with autism is a deliberate attempt to make it both formal and accurate.
In the following sections we present the framework, examine its basic properties and discuss its relations with insistence on sameness. Then, we devise the guidelines for learning therapies aimed at increasing autonomy of persons with autism, formulate a therapy as an optimization problem and discuss two framework validation approaches. Finally, we conjecture that insistence on sameness is a special case of a general minimizing uncertainty pattern (cf. Friston, 2010) and conclude the work with the proposal of a new definition of autism.
Methods
2
The framework consists of three components: the entropy metric (1), a stimuli processing loop in Figure 1, and a set of Assumptions 1 –4 that represent the cognitive constraints of a person with autism.
Stimuli processing loop.
Entropy metric
2.1
The metric (1) measures an entropy distance between two discrete random variables R and M. To illustrate its basic properties, we will consider two opposite scenarios:
R fully determines M (and vice versa). In this case knowledge of one variable removes the entire uncertainty about the other and thus both conditional entropy terms are zero (and so is the metric),
Both R and M are independent of each other. Knowing either of them carries no information about the other (so they maintain their original uncertainties):
Hence, in order to reduce the entropic distance between R and M one can learn about R (and store that knowledge in M), or constrain R to the already known M.
Stimuli processing loop
2.2
The loop (see Figure 1) serves as a model of stimuli processing. Its two phases, perception, and prediction are associated with the components of the metric, H(R|M = mn−1) and H(M|R = rn), that quantify how surprise and uncertainty are affected by the incoming stimulus rn. If levels of surprise or uncertainty exceed the respective thresholds, Tso or Ta, a sensory overload or anxiety state can occur.
Assumptions
2.3
To specify the cognitive constraints of a non-verbal low-functioning individual with autism, we assume that:
The stimuli, rn, n = 0, 1, …, represent a perceived environment and form sequences. The mixture of all sequences is denoted by R.A memory, M, is capable of storing sequences and retrieving their elements, mn, n = 0, 1, ….Each memory item, mn = ρ(rn), is a result of a classification of a single stimulus, rn, by a nearest neighbor algorithm.Both sensory overload and anxiety thresholds, Tso, Ta>0, are random variables.
Comments on assumptions
2.4
Each stimulus, rn, can be interpreted as a “snapshot” of the environment and mn = ρ(rn) as its counterpart perceived by the senses and located in memory M (cf. e.g., Nobre, 2022). The sequences in the memory are represented by chains/directed graphs (see Figure 2).
An instance of a graph composed of memorized stimuli sequences, mn = ρ(rn), n = 0, 1, … with a set Mn+1={mn+1′,mn+1′′} consisting of a pair of alternative events.
Remark 2.1. All stimuli are treated in a unified way, as raw (random) signals, and their sources (e.g., external/exogenous or internal/endogenous) are not distinguished. Moreover, their semantics (i.e., the non-tangible, abstract, cultural meanings and associations) are not available. Thus, only their perceptible properties are taken into account and treated as equally important; i.e., all their abstract properties and relations (see e.g., Bäck, 2014), incl. importance, hierarchy, being a part/whole are ignored.
Example 2.1. An abstract notion of nothing cannot be represented by any perceptible stimulus. This observation can also be derived from the namesake property of a nearest neighbor algorithm (cf. Assumption 3), where the classification routine always selects the closest remembered (known) item, regardless of how "far" it is in terms of a distance function.
Remark 2.2. Assumptions 2–3 define a cognitive model with the inference abilities limited to replication of the memorized stimuli and their sequences. They also imply a non-vanishing (persistent) character of the memory.
Remark 2.3. Defining the thresholds, Tso, Ta, as random variables allows them to be unknown, varying in time, specific for an individual and driven by the factors not included in the model (cf. e.g. Kim and Kim, 2023; Lai, 2023).
Analysis
2.5
The goal of this section is to inspect how the metric changes during perception and prediction phases; cf. the processing loop in Figure 1. The amount of surprise in the former, given the memorized stimulus, mn−1, is
while the level of uncertainty in the latter, given the incoming stimulus, rn, is
so that during perception we need to consider only the first term, H(R|M = mn−1), and only the other, H(M|R = rn), during prediction. We examine them in the following three cases; cf. Section 2.1:
The first, deterministic, takes place when in perception phase knowledge of mn−1 determines rn so that there is no surprise, and when there is no uncertainty during prediction, because knowledge of rn determines mn+1. Then, both terms are zero:
The second case occurs when, in the perception phase, mn−1 determines a set Rn of possible stimuli or, when in prediction, the set of memorized possibilities Mn+1 occurs given the stimulus rn (cf. Figure 2). The respective metric terms are:
The last case encompasses, amongst others, the learning-like situations when rn is a new (unknown) stimulus, so that the current knowledge of mn−1 carries no information about it during perception and the stimulus itself does not reduce the level of prediction uncertainty:
It should be noted here that because the sets of known options, Rn and Mn+1, are subsets of R and M, the amounts of surprise or uncertainty in 2 seem to always be smaller than in 3 (when unknown stimuli occur). Nevertheless, for some realizations of mn−1 and rn the following inequalities may hold (see e.g., MacKay, 2003, Ch. 8):
and hence the surprise and uncertainty levels in (5) can be larger than in (6).
Example 2.2. Such a situation can take place when, in prediction phase, the set Mn+1 consists of multiple options, , some ν, that are equally probable; (cf. Figure 2). The amount of uncertainty is then increased (predictions are the least efficient) and can easier lead to disorientation and anxiety (especially when ν is large).
So far, we have conditioned the metric terms on the last realizations of the stimulus rn (or on the latest memory item mn−1), as if the amounts of surprise and uncertainty are independent of the past [like in Markov chains (see e.g., Ekroot and Cover, 1993)]. However, to better comply with Assumptions 2 –3 and Remarks 2.1–2.2, one should take into account the previous events as well.
Example 2.3. If, for instance, in the perception phase, we condition the surprise on the set Mn−1 = {mn−1, Mn−2, …} of all previous items in a sequence, then the term H(R|M = Mn−1) can quantify a disappointment that the sequence does not continue in a way it has been memorized.
Classification
2.5.1
The influence of the classification phase (i.e., the assignment mn = ρ(rn) in Figure 1) has so far been ignored. This is because it resembles the impact of prediction. First, observe that the metric reduces to the same term; cf. (3):
Next, note that if the stimulus, rn, is already known, then it fully determines the corresponding mn (cf. Assumption 2 and Remarks 2.1–2.2). In turn, if rn is new, it carries no information about the memory. Hence, cf. (4) and (6):
Remark 2.4. Although the presence of randomness, even in a fixed and arranged environment, seems to be inevitable (cf. e.g., Davis and Plaisted-Grant, 2015; Walker et al., 2023), the modified environments and/or the sequences can still be perceived as known as long as those random changes do not affect the results of classification; cf. Remark 2.1.
Results
3
Analogies with insistence on sameness
3.1
In case of non-verbal low-functioning individuals with autism, both sensory overload and anxiety can lead to violent, (self-)aggressive behaviors [sometimes labeled as meltdowns or tantrums (Volkmar, 2021)]. Insistence on sameness can be seen as a set of actions aimed at decreasing the risk of these events, either by observing (learning) the environment (i.e., making M better approximate R), or by constraining it to the already known one (that is, making R better resemble M). For example (cf. Uljarević et al., 2017):
Pedantry, together with a routinized and repetitive behavior [inc. strict following only known sequences of activities and keeping the known environments unchanged; cf. an aberrant precision in (Lawson et al., 2014)], are the means to reduce the levels of surprise or uncertainty; see Ex. 2.2, 2.3 and Remark 2.4.Ritualistic and stereotyped behaviors stem from the lack of the ability to ignore items; see Remarks 2.1-2.2 and cf. Guideline 3.Restrictive and rigid behavior includes actions like forcing deterministic scenarios in order to further control surprise and uncertainty [and to effectively zero their levels as in (4)], to avoid disappointments and to select a preferred (the most frequent) option in multiple option cases; cf. (5), (7) and Ex. 2.3.Vigilance and observance reduce the number of options, that is, the uncertainty level caused by them in (5); see Ex. 2.2.Aversion to learning is a way to avoid potentially high surprise/uncertainty levels caused by new options or new stimuli; cf. (5)-(7).
Therapeutic guidelines
3.2
In practical terms, the goal of the learning therapy is twofold:
Increasing autonomy of an individual (by learning new vital activities in a supervised way),Reduction of aversion to exploration (by controlling uncertainty of the new environments designed for semi-supervised and unsupervised activities).
Assumptions 1-4, Remarks 2.1–2.3, and the subsequent analysis in Section 2.5, imply that the therapy is a long-term incessant process with several limitations on how it can be implemented (in particular, they exclude rule-based learning; cf. Remarks 2.1-2.2 and Betty et al., 2015; Galitsky, 2016):
Only tangible objects and their realistically simulated counterparts should be used in learning environments and in sequences of activities.Artifacts such as pictures (especially, in simplified forms of drawings or labels), but also spoken words and sentences, should a priori be assumed as unrelated with the tangible items they represent or refer to. Hence, if necessary, these relations need to be learned separately. Abstract notions, like being a child or a parent, a teacher or a caretaker, and relations like love, trust or friendship, are not available either.The meaning of tangible items depends on the context (created by other items and/or their past sequences; cf. Remark 2.2). Such a correlative-like reasoning leads to ”cum/post hoc ergo propter hoc” fallacies; see Example 3.2.Remembering the exact sequences (of events and activities; cf. Remarks 2.1 and 2.2) implies they are treated as a whole and that ”shortcuts” in sequences or other ”by-the-way” activities will likely be perceived as new (and will increase levels of surprise and/or uncertainty); cf. Ex. 2.3 and (Lawson et al., 2014; Noel et al., 2025).Multiple choice situations (branches in sequences; see Figure 2) introduce uncertainty even if the environment remains unchanged; cf. Cases 2–3, and Exs. 2.2–2.3 and 3.1.
Example 3.1. In order to reduce the risk of anxiety induced by predictions in multiple option/choices situations, several countermeasures could be applied: making all sequences distinct (with the help of amulet- or talisman-like artifacts that will allow to distinguish the otherwise same activities), or locating the branches at the sequence beginnings (to reduce these uncertainties as early as possible), or limiting the numbers of possible options in them.
Example 3.2. An instance of an immediate reward will likely turn into a part of a routine while its delayed version will become a part of (possibly unrelated) sequence it will occur in. This is because the delayed gratification concept relies on an abstract concept of an award and its causal relation with the past events.
Self-stimulation activities and comfort zone
3.3
The guidelines 1–5 are aimed at reducing risk of tantrums during learning/therapy. However, in case of non-verbal individuals, it is difficult to assess whether the levels of surprise and/or uncertainty are sufficiently low (w.r.t. their thresholds TsoandTa) to safely start a new sequence of activities. The problem is of special importance at the beginning of the therapy when the environments and activities the individuals are familiar with (inc. their routines stored in M) are not known. Nevertheless, if the stimuli are treated in a unified way, so that their exogenous or endogenous nature is not distinguished (cf. Assumption 1 and Remark 2.1), we can conjecture that (cf. Davis and Plaisted-Grant, 2015):
Conjecture 3.1. Self-stimulation activities [inc. ”flapping, waving, finger wiggling, mouth opening, orofacial movements, head nodding” (see Volkmar, 2021)], are a sign of low levels of perceived surprise and uncertainty.
The rationale behind this claim is following: without external stimuli (in a state of sensory deprivation), the internal ones become dominant. If sufficiently random, they can carry an amount of entropy large enough to exceed either sensory overload or anxiety threshold. Self-stimulations can therefore (in case of low-functioning individuals, in particular) be a way to generate known (zero- or low-entropy) stimuli that keep the surprise/uncertainty levels below these thresholds (i.e., inside a ”comfort zone”).
Remark 3.1. One of the Reviewers raised a question about the role of white noise. That is an interesting and open problem as one can consider two potentially opposite, negative and positive, effects:
Presence of a white noise in stimuli increases their conditional entropy and the risk of sensory overload. It can affect the results of the classification and increase the risk of anxiety as well; cf. Remark 2.4.In turn, as pointed out by (Davis and Plaisted-Grant 2015), noise can help avoid getting stuck in local minima during optimization and, for instance, enable generalization (abstraction) abilities. In our framework, such effect can be associated with jumping between the known sequences or finding the shortcuts within them.
White noise, if present in exogenous visual or aural stimuli, could be detected with the help of basic audio-video codecs [the more noisy environment, the less effective entropy coders; (see MacKay, 2003, Ch. 1)].
Learning therapy as optimization problem
3.4
In the framework's terms, learning a low-functioning person with autism is equivalent to presenting new stimuli and their sequences. Therefore, a learning therapy with the goals 1–2, can be formulated as the following optimization problem: Problem 2.1. Given an initial state of a memory M, find R such that
subject to
given Assumptions 1–4 and the processing loop in Figure 1.
The objective function (8) is the mutual information of R and M, and is related to the entropy metric in (1) via identity DH(R, M) = H(R, M)−I(R; M), where H(R, M) is the joint entropy of R and M (see MacKay, 2003, Ch. 8; and Friston, 2010; Ramstead et al., 2023). The practical value of such a formulation is that it allows learning therapies and daily care routines to be defined as optimization algorithms and implemented as programs for robotic live-in caregivers.
Discussion
4
Validation
4.1
One can consider the following two (complementary) approaches to validate/reject the framework's assumptions and the resulting analogies:
A traditional (direct) variant, where the realistic environments and interactively generated sequences of new activities are presented through a “digital window” (e.g., a wall-like display with a haptic interface). The intensity (and the pace the sequences are presented with) depends, in accordance to recommendations 1 –5, on the levels of surprise and anxiety assessed from e.g., the observed behavior or from a frequency of interactions between the individual and the generated world. The long-term therapeutic effects, like increased autonomy in these environments (with possible help of appropriately programmed robotic live-in caregivers), is evaluated in a traditional way (e.g., in a form of progress questionnaires) by parents and therapists.
This is a standard approach that requires an access to individuals with autism living in their homes (or other familiar environments; cf. the analogies 1–5). Getting permissions for such arrangements is usually difficult (especially at early validation stages). Moreover, in order to satisfy reproducibility conditions, each experiment needs a tedious and time-consuming individual setup (cf. Assumption 2 and Ex. 2.3). Therefore, as an alternative (as a preliminary test phase) we propose:
A Turing test-like approach, in which the framework is used to create avatars (“digital twins”) residing in realistic virtual environments, where their simulated behavior is observed and validated by therapists or by early-detection computer tools; cf. (Dillion et al., 2023) and (Perochon et al., 2023), respectively.
Remark 4.1. Validation methods which are useful to assess higher-functioning individuals and which (implicitly) assume an individual is able to perceive abstracts and operate on them, e.g., by examining their propensity to ignore opportunity costs (see e.g., Da Silva et al., 2025), cannot be used here.
Final remarks
4.2
We conclude the work with a claim that reducing uncertainty is not specific to autism.
Conjecture 4.1. The phenomenon of insistence on sameness in autism is a special case of the generic behavior pattern described in Friston (2010) where “Adaptive agents must occupy a limited repertoire of states and therefore minimize the long-term average of surprise associated with sensory exchanges with the world [and that] action under the free-energy principle reduces to suppressing sensory prediction errors that depend on predicted (expected or desired) movement trajectories.”
Therefore, a distinctive character of insistence on sameness should rather be attributed to cognitive limitations of a person with autism. Thus, if correct, the claim corroborates the following definition of the disorder; cf. (Mottron and Bzdok, 2020), Assumptions 2–3 and Remark 2.1:
Definition 4.1. Autism is an impairment in which cognitive functions are restricted to discrimination, memorization and prediction of tangible properties of the environment.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1American Psychiatric Association DSM-5 Task Force. (2013). Diagnostic and Statistical Manual of Mental Disorders, 5th Edn. American Psychiatric Publishing, Inc. Available online at: 10.1176/appi.books.9780890425596 (Accessed December 15, 2025). · doi ↗
- 2Assary E. Oginni O. A. Morneau-Vaillancourt G. Krebs G. Peel A. J. Palaiologou E. . (2024). Genetics of environmental sensitivity and its association with variations in emotional problems, autistic traits, and wellbeing. Mol. Psychiatryp 29, 2438–2446. doi: 10.1038/s 41380-024-02508-638499655 PMC 11412899 · doi ↗ · pubmed ↗
- 3Bäck A. (2014). Aristotle's Theory of Abstraction, vol. 73. Cham: Springer. doi: 10.1007/978-3-319-04759-1 · doi ↗
- 4Bailey D. H. Jung A. J. Beltz A. M. Eronen M. I. Gische C Hamaker E. L. . (2024). Causal inference on human behaviour. Nat. Hum. Behav. 8, 1448–1459. doi: 10.1038/s 41562-024-01939-z 39179747 · doi ↗ · pubmed ↗
- 5Betty J. S. Ho P. V. Carter M. (2015). Cognitive–behavioural approach for children with autism spectrum disorder: a literature review. J. Intellect. Dev. Disabil. 40, 213–229. doi: 10.3109/13668250.2015.1023181 · doi ↗
- 6Chapman R. Veit W. (2021). The essence of autism: fact or artefact? Mol. Psychiatry 26, 1440–1441. doi: 10.1038/s 41380-020-00959-133293687 · doi ↗ · pubmed ↗
- 7Da Silva S. Fiebig M. Matsushita R. (2025). Opportunity costs, cognitive biases, and autism. J. Mind Med. Sci. 12:11. doi: 10.3390/jmms 12010011 · doi ↗
- 8Davis G. Plaisted-Grant K. (2015). Low endogenous neural noise in autism. Autism 19, 351–362. doi: 10.1177/136236131455219825248666 · doi ↗ · pubmed ↗
