Cerebellar Single‐Pulse TMS Differentially Affects Early and Late Error Processing in Reinforcement Learning
Dana M. Huvermann, Adam M. Berlijn, Stefan J. Groiss, Manfred Mittelstaedt, Alfons Schnitzler, Christian Bellebaum, Martina Minnerop, Dagmar Timmann, Jutta Peterburs

TL;DR
This study shows that disrupting cerebellar function with TMS changes how the brain processes errors during learning, affecting fast and conscious error detection differently.
Contribution
The study provides causal evidence that the cerebellum contributes to error processing in reinforcement learning contexts.
Findings
Cerebellar spTMS reduced error processing in the ERN component.
Cerebellar spTMS increased error awareness in the Pe component.
The cerebellum influences both fast and conscious error processing during reinforcement learning.
Abstract
There is increasing evidence that the cerebellum contributes to feedback processing in reinforcement learning. As yet, it has not been investigated whether the cerebellum also contributes to error processing in reinforcement learning. Studies have shown, however, that the cerebellum is involved in the processing of response errors in non‐reinforcement learning contexts, for example, in response conflict tasks. In the present study, we aimed to extend these findings to the processing of response errors, which slowly emerges as a result of reinforcement learning. To this end, we inhibited the cerebellum via single‐pulse transcranial magnetic stimulation (spTMS) and recorded cerebral electroencephalography (EEG) measures associated with error processing. If input from the cerebellum is required for error processing, error‐correct differentiation should be decreased for cerebellar compared…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4
FIGURE 5- —Deutsche Forschungsgemeinschaft10.13039/501100001659
- —Bernd Fink‐Foundation
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVestibular and auditory disorders · Transcranial Magnetic Stimulation Studies · Neural and Behavioral Psychology Studies
Introduction
1
Understanding how organisms optimize their behavior in dynamic environments is crucial not only to improve learning processes but also to advance our understanding of disorders associated with maladaptive learning, such as addiction and depression (Gueguen et al. 2021; Chen et al. 2015). Reinforcement learning is a basic form of learning in which behavior is shaped by its consequences/outcomes, that is, rewards that reinforce and punishments that inhibit a specific behavior (Sutton and Barto 2018). Initially, in an unfamiliar context, information about actions and outcomes must be gathered on a trial‐and‐error basis. With learning, actions are then chosen based on their predicted outcomes. Learning success thus strongly depends on the accuracy of outcome predictions. While improving these predictions, the individual gets a better understanding of which action is correct and which is false. Ultimately, the individual is able to identify an error already at the stage of action execution, rather than having to wait for external feedback/the outcome. This shift from outcome‐level processing to response‐level processing underlying the distinction between right and wrong responses throughout the learning process could be shown in a reinforcement learning task by recording brain activity using electroencephalography (EEG; Eppinger et al. 2008; Bellebaum and Colosio 2014).
Processing of both actions/responses and outcomes has been predominantly linked to structures in the fore‐ and midbrain (Corlett et al. 2022). In EEG studies, error processing has been shown to emerge with learning/task progression when an understanding of correct and false responses has been developed (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008). In later stages of a learning task, a more pronounced negative deflection in the response‐locked signal is typically found for errors relative to correct responses (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008), that is, the error‐related negativity (ERN; Falkenstein et al. 1991; Gehring et al. 1993). The ERN has a frontocentral scalp distribution and typically peaks within 100 ms post‐response. Its origin lies primarily in the anterior cingulate cortex (ACC, Dehaene et al. 1994; Miltner et al. 2003; Iannaccone et al. 2015, but also see Herrmann et al. 2004) which has been associated with error processing (Hester et al. 2004). The ERN is followed by the more posterior error positivity (Pe, peaking 200–400 ms post‐response, Falkenstein et al. 1991; Wessel 2012). ERN and Pe have been proposed to be functionally distinct (Wessel 2012), with the ERN reflecting a fast‐paced mismatch between the actual and desired response (Coles et al. 2001; Nieuwenhuis et al. 2001), and the Pe reflecting more conscious error processing (Nieuwenhuis et al. 2001; Ridderinkhof et al. 2009). On the other hand, feedback processing, as reflected in the feedback‐related negativity (FRN), is typically found at early stages of reinforcement learning where participants strongly depend on external feedback to perform the task accurately (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008). The FRN has been described as a functional equivalent of the ERN during feedback processing, as both seem to contribute toward an adjustment of behavior toward error correction (Gentsch et al. 2009). In addition, there seems to be a high overlap in topography and neural generators (Gentsch et al. 2009; Holroyd and Coles 2002; Potts et al. 2011).
Interestingly, recent studies in rodents (Kostadinov and Häusser 2022) and humans (Huvermann et al. 2025; Rustemeier et al. 2016; Berlijn et al. 2025) have provided evidence for a potentially supportive role of the cerebellum in feedback processing during reinforcement learning (Peterburs and Desmond 2016). The cerebellum is best known for predictive processes in the context of motor control (Popa and Ebner 2019) but in the last decades increasingly also for cognitive processes (Berlijn et al. 2024b; Sokolov et al. 2017). The cerebellum is thought to support both motor and cognitive function by predicting outcomes via internal forward models (Popa and Ebner 2019; Wolpert et al. 1998; Tanaka et al. 2020), connecting with a wide range of cerebral brain areas, including the ACC, in a closed‐loop fashion (Ramnani 2012; Schmahmann and Pandya 1997; Glickstein et al. 1985; Kruithof et al. 2023; Habas 2021; Bostan and Strick 2018). Cerebellar dysfunction might thus influence feedback processing as reflected in the FRN via maladaptive support of ACC function. Indeed, in recent studies (Huvermann et al. 2025; Berlijn et al. 2025), we found that cerebellar lesions, degeneration, and TMS disrupted feedback processing in the sense that the prediction error was not represented in the FRN.
These previous studies (Huvermann et al. 2025; Rustemeier et al. 2016; Berlijn et al. 2025) have focused on the role of the cerebellum at the outcome stage. However, prediction at the response stage (i.e., error processing), as described above, is also a prominent part of reinforcement learning. Cerebellar damage and disruption of cerebellar function by non‐invasive brain stimulation have already been associated with deficits in error processing in response conflict tasks (Peterburs et al. 2012, 2015; Berlijn et al. 2024a; Tunc et al. 2019). Specifically, differentiation between errors and correct responses in the ERN was consistently reduced for cerebellar dysfunction (Peterburs et al. 2012, 2015; Berlijn et al. 2024a, only on trend level in Tunc et al. 2019). For the Pe, findings are more heterogeneous, with most studies not finding effects of cerebellar dysfunction, except for one study in cerebellar post‐acute stroke which showed increased error‐correct differentiation that was interpreted as compensatory for deficient error processing in the ERN (Peterburs et al. 2012). Response conflict tasks, however, contain no feedback and can instead be performed based on the initial instructions. For example, in a flanker task, participants need to indicate the direction of a central arrow in the presence of flanking arrows. Predictions thus do not evolve slowly with learning as in reinforcement learning.
In summary, previous studies support a cerebellar role in outcome processing in reinforcement learning and error processing in response conflict tasks. This is consistent with the proposed role of the cerebellum in performance monitoring, that is, in functions which support adaptive behavior, to which both reinforcement learning and error processing contribute (Peterburs and Desmond 2016). Error and feedback processing are closely intertwined, and it seems conceivable that in reinforcement learning tasks, disrupted feedback processing (on which participants rely in particular early in the task) caused by cerebellar dysfunction leads to changes in error processing (which emerges later in the task, with learning from feedback). These changes may be similar to those found for error processing under cerebellar dysfunction in response conflict tasks (Peterburs et al. 2012, 2015; Berlijn et al. 2024a; Tunc et al. 2019).
In the current study, we aimed to examine if aberrant feedback processing in cerebellar dysfunction transfers to the response phase with learning progression in a reinforcement learning task. We disrupted cerebellar function by non‐invasive brain stimulation in young adults. Single‐pulse TMS (spTMS) excites the subjacent neuronal populations followed by a prolonged period of reduced activity (Romero et al. 2019), potentially leading to inhibition or facilitation depending on various factors including stimulation site and timing (Shirota and Ugawa 2024; Luber and Lisanby 2014). For cerebellar stimulation, an inhibitory effect of spTMS on cortical function is mostly assumed (Desmond et al. 2005; Schutter and van Honk 2006; Viñas‐Guasch et al. 2023, but also see Du et al. 2018, for a review see Fernandez et al. 2020). We analyzed data from a previous study by our group (Huvermann et al. 2025) which were collected in young, healthy adults who received cerebellar spTMS while performing a probabilistic feedback learning task with trial‐by‐trial feedback. Importantly, overall learning performance was not affected by the TMS, in theory enabling error processing as the task progresses and learning takes place (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008). ERN and Pe were analyzed as EEG indices of error processing. In accordance with previous work in response conflict tasks (Peterburs et al. 2012, 2015; Berlijn et al. 2024a), we expected to see reduced or absent error‐correct differentiation in the ERN for cerebellar TMS (Iannaccone et al. 2015, but also see Berlijn et al. 2024b). We expected to see this effect more strongly later in the task when response‐outcome contingencies have been learnt and error processing is more pronounced (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008). However, we did not expect to see distinct compensatory mechanisms indexed by an increased Pe as observed in cerebellar stroke patients (Peterburs et al. 2012) due to the immediate effect of spTMS. Two stimulation timings were used, to differentiate direct disruption of error processing (via post‐stimulus/pre‐response TMS) from indirect effects of disrupted feedback processing on error processing (pre‐feedback TMS) due to maladjusted predictive processes.
In line with the hypotheses, we found decreased error‐correct differentiation in the ERN for cerebellar TMS. In addition, error‐correct differentiation in the Pe was increased for cerebellar stimulation while behavioral performance was overall preserved.
Material and Methods
2
The present study was part of a larger investigation of cerebellar contributions to reinforcement learning and presents novel, follow‐up analyses of data reported previously by our group (Huvermann et al. 2025). There, we focused on outcome/feedback processing and thus did not analyze response‐locked ERPs. We performed two studies on reinforcement learning, one with cerebellar stroke patients and respective controls, the other with healthy adults using cerebellar (vs. vertex) spTMS. The present work is focused on the spTMS study, because older adults typically show only weak error‐correct differentiation in the response‐locked ERP in reinforcement learning (Eppinger et al. 2008; Pietschmann et al. 2008; Herbert et al. 2011). However, analogous analyses were also performed for data from the patient study and are provided in Supplementary Analysis S1.
Participants
2.1
Sample characteristics are detailed in Huvermann et al. (2025). Data from 24 healthy participants (7 men, 17 women; mean age 23.3 years, SD = 2.9 years, age range 19–30 years) entered the analyses. According to the Edinburgh Handedness Inventory (Oldfield 1971) scores, 20 participants were right‐handed, two left‐handed, and two ambidextrous.
All participants gave written informed consent prior to participation. The study was conducted in accordance with the ethical principles for medical research involving human subjects outlined in the Declaration of Helsinki and approved by the Ethics Committees at the Faculty of Medicine of Heinrich‐Heine‐University Düsseldorf (2018‐240_1) and the University Hospital Essen (18‐8477‐BO).
Procedure
2.2
Please see Huvermann et al. (2025) for a detailed description. In brief, cerebellar and vertex TMS took place in separate sessions at least 48 h apart to decrease repetition effects in the task. After EEG and EMG preparations and motor threshold estimation, the double cone TMS coil was positioned and secured to the participant's head (see Figure 1). Before and after the experimental task, an additional cognitive task was performed for which results are reported in Berlijn et al. (2024a).
Experimental setup. Depending on the session, TMS was applied to either the left cerebellum (1 cm down and 3 cm to the left of the inion) or vertex using a double cone coil. EEG and EMG were recorded simultaneously. Reproduced from Huvermann et al. (2025) with permission.
Participants completed a probabilistic feedback learning task (Eppinger et al. 2008; Bellebaum and Colosio 2014). Figure 2 illustrates the sequence and time course of stimulus presentation in each trial. The task consisted of 6 blocks of 56 trials, thus 336 trials in total. Each trial began with a fixation cross, followed by one of four stimuli (Chinese characters). Participants responded by pressing the left or right button on a response box within a response window of 1000 ms. Choices were highlighted on the screen, followed by a black screen before feedback was displayed, with “+20ct” in green font as positive feedback or “−10ct” in red font as negative feedback. Two stimuli were linked to random feedback (50% positive and 50% negative, independent of response), while the other two stimuli were linked to contingent feedback. Here, correct responses were followed by positive feedback in 80% of the cases and by negative feedback in 20% of the cases (vice versa for errors). Contingencies could thus be learnt. TMS was delivered 100 ms post‐stimulus for one stimulus and 100 ms pre‐feedback for the other.
Time course in one trial in the experimental task. (A) Time course of stimulus presentation and timing of TMS pulses in one trial in the experimental task. First, a fixation cross was presented for 500–1000 ms. Subsequently, one of four stimuli was presented for 500 ms together with flanking rectangles representing the response options. Participants responded by pressing the left or right button on a response pad up until 500 ms after the stimulus was presented. The respective rectangle was highlighted for 200 ms. After 500 ms of blank screen, positive (“+20 ct”) or negative feedback (“−10 ct”) was presented for 1000 ms. Participants had to learn by trial and error which of the two options was more likely to result in positive/negative feedback, separately for each of the four stimuli. Feedback for two stimuli had an 80% contingency, and a 50% contingency for the other two. Stimulation in a particular trial was applied either 100 ms post‐stimulus or 100 ms pre‐feedback. The task consisted of 336 trials. (B) Time course of stimulus presentation and timing of the TMS pulse in relation to ERN and Pe time windows in one trial in the experimental task. ERN was quantified in the time window between 0 and 100 ms following the response while Pe was quantified in the time window between 200 and 400 ms following the response. Distance to the post‐stimulus TMS pulse thus differed and depended on response time in the respective trial, while the pre‐feedback TMS pulse always occurred after ERN and Pe. Note that the TMS pulse in a particular trial was given either post‐stimulus or pre‐feedback.
TMS was applied at 120% of motor threshold using a Magstim Double Cone Coil and a Magstim BiStim^2^ unit (Magstim Co., Whitland, United Kingdom). A fast‐paced task flow was enabled by alternating stimulation between two BiStim units. Stimulation was applied either to the left lateral cerebellum (1 cm below and 3 cm to the left of the inion; confer Hardwick et al. 2014; Théoret et al. 2001; Torriero et al. 2004) or position vertex as a control site (at electrode position Cz, Jung et al. 2016; Pizem et al. 2022). Stimulation of the left cerebellar hemisphere is consistent with its implication in processing visual–spatial information (Stoodley and Schmahmann 2009) and stronger activations of the left hemisphere in a previous fMRI study using a similar feedback learning task (Peterburs et al. 2018). Following spontaneous reports of side effects in the initial testing sessions, a post‐experimental questionnaire was introduced in which participants were asked to rate symptoms associated with TMS [see Huvermann et al. (2025) for more details]. No significant differences between vertex and cerebellar stimulation were observed regarding headaches, neck pain, toothaches, inattentiveness, discomfort, phosphenes ratings, or free field responses for other symptoms (all p ≥ 0.343, see Figure 3).
Ratings of side effects in the post‐experimental questionnaire, as reported in Huvermann et al. (2025). Means and standard errors are shown in red, individual ratings are shown in black.
EEG Recording and Preprocessing
2.3
Data were recorded at 1000 Hz from 30 passive Ag/AgCl multitrode electrodes positioned in the 10–20 system (Chatrian et al. 1985), using BrainAmp MR amplifier and BrainVision Recorder 1.21 (Brain Products GmbH, Gilching, Germany). Impedances were kept below 5 kΩ.
For preprocessing, the ARTIST algorithm by Wu et al. (2018) based on EEGLAB (v2022.1; Delorme and Makeig 2004) was used. This algorithm decreases artifacts in the EEG signal caused by TMS pulses [see Huvermann et al. (2025) for a detailed description of preprocessing procedures].
Using Brainvision Analyzer 2 software (version 2.2, Brain Products GmbH, Gilching, Germany), data were segmented around responses, starting 200 ms before and ending 500 ms after the response. Next, a baseline correction was performed using the time window from 200 to 100 ms before response onset. Data were then exported for further processing in MATLAB. Although data were analyzed on a single‐trial basis, we additionally averaged the data according to conditions (stimulation site, TMS timing, response type) to extract peak latencies of the ERP components of interest (described below). Only trials for stimuli with learnable contingencies (i.e., 80–20) were included.
Peak detection was performed on the averaged data and separately for each condition for the ERN and Pe using MATLAB. The time windows and electrode sites that had been pre‐registered based on previous related studies (Peterburs et al. 2012, 2015; Berlijn et al. 2024a; Tunc et al. 2019) were used. For the ERN, peak detection was performed at FCz in the time window starting at response onset and ending 100 ms thereafter. For the Pe, we used the maximal positive peak within the time window between 200 and 400 ms at Pz. For the single‐trial data, the mean amplitude in a time window around the respective latency determined by the peak detection on the averaged data for each condition was extracted (20 ms for ERN; 40 ms for Pe, Albrecht and Bellebaum 2023; Meadows et al. 2016).
Statistical Data Analysis
2.4
Data were analyzed in R (version 4.2.3, R Core Team 2023) using RStudio (version 2023.3.0.386, Posit Team 2023). Analyses of accuracy and choice switching (i.e., choosing a different response than before following e.g., positive/negative feedback) are reported in Huvermann et al. (2025). As data were not clearly separable into pre‐ and post‐learning for a majority of participants, we opted for a single trial‐based analysis approach using linear mixed effects (LME) models including the trial‐by‐trial factor trial number, thus capturing the course of error‐correct differentiation across the experiment. This also overcame a common concern of unequal numbers of trials for errors and correct responses as well as (too) few error trials per condition (Olvet and Hajcak 2009; Pontifex et al. 2010; Larson et al. 2010) due to learning throughout the task. Exclusion of participants with too few error trials would systematically exclude good learners (Clayson et al. 2025) and including an equal number of trials in the analysis does not salvage the concern (Fischer et al. 2017). Multilevel approaches using single‐trial data; however, overcome these limitations by taking into account different numbers of data points per factor level and being relatively robust to large numbers of missing data points (Clayson et al. 2025; Bolker 2015; Krueger and Tian 2004).
The packages lme4 (version 1.1‐32, Bates et al. 2015) and lmertest (version 3.1–3, Kuznetsova et al. 2017) were used for LME modeling. We used restricted maximum likelihood with p‐values computed using Satterthwaite approximation to evaluate significance, following Luke (Luke 2017). Participants with a Cook's distance (Cook 1977) above 4/(n‐p‐1) were identified as outliers (using the influence.ME package, version 0.9‐9, Nieuwenhuis et al. 2012). We strived for a maximal random effects structure but in case of singular fit gradually reduced random effects starting with main effects and then lower‐grade interactions until fit was ensured. Significant interactions were followed up using simple slope analyses (interactions package, version 1.1.5, Long 2019). p‐values were Bonferroni‐corrected according to the number of simple slopes.
LME analyses were conducted with the categorical fixed effects response type (−0.5: error, 0.5: correct), stimulation site (−0.5: vertex, 0.5: cerebellum), TMS timing (−0.5: post‐stimulus, 0.5: pre‐feedback), and the continuous factor trial number, which was scaled via the built‐in scale function. We also included all interactions of these factors as fixed effects. No participants were identified as outliers based on Cook's distance. The model equation for both ERN and Pe was as follows:
Note that we also performed a complementary analysis using action value modeling, as commonly conducted for analyses involving prediction error modeling in reinforcement learning contexts (see e.g., McDougle et al. 2019; Ichikawa et al. 2010). Analyses involved a new measure, Q diff, which reflects the relative subjective action value (action value of the unchosen choice subtracted from the action value of the chosen option). It should thus offer a better measure for subjective error processing than the objective response type (see Supplementary Analysis S2 for further details), especially because, due to the probabilistic nature of our task, errors and correct responses are not as clearly defined as in response‐conflict tasks. Relative action values have been shown to be more reliable than absolute action values (Katahira et al. 2017), although the procedure of action value/prediction error estimation in general has been shown to be highly correlated to subjective measures (Ichikawa et al. 2010).
Results
3
Accuracy
3.1
While a main effect of block, that is, a general learning effect, was found (F(3.18, 73.05) = 6.21, p < 0.001), no differences between cerebellar and vertex TMS emerged, p ≥ 0.461. These results indicate that error rates decreased over the course of the task and were not affected by cerebellar TMS. On average, 9.6 errors per block, stimulation site, and participant were committed (SD = 4.4 errors). The full results concerning accuracy are reported in Huvermann et al. (2025).
ERN—Effects of Response Type (Error/Correct)
3.2
Grand averages for the ERPs at FCz time‐locked to individual ERN latencies for correct responses and errors (i.e., response type) according to stimulation site, TMS timing, and trial number (early, late experiment) are provided in Figure 4A.
*(A) Grand‐average ERPs at FCz locked to individual ERN latencies per condition (response type × stimulation site × TMS timing): early and late in the task according to response type (correct, error), stimulation site (cerebellum, vertex), and TMS timing (post‐stimulus, pre‐feedback). Blue lines denote correct responses, red lines errors. Colored bands display standard errors. See Figure S2 for a response‐locked grand‐average ERP. (B) Slope estimates for ERN amplitude predicted by response type and modulated by stimulation site and trial number (early, late experiment). Red lines denote cerebellar stimulation and blue lines vertex stimulation. Colored bands indicate 95% confidence intervals. **p < 0.001. n error = 2702, n correct = 4777.
The ERN was more negative for errors compared to correct responses (β = 0.81, SE = 0.16, t(7451.23) = 4.94, p < 0.001). This effect was further modulated by trial number (β = 0.51, SE = 0.16, t(7394.78) = 3.22, p = 0.001). While response types did not differ in ERN amplitude early on (β = 0.29, SE = 0.22, t = 1.30, p = 0.386), errors as compared to correct responses were associated with increased negativity late in the task (β = 1.35, SE = 0.24, t = 5.75, p < 0.001).
Importantly, this interaction was further modulated by stimulation site (β = −0.80, SE = 0.32, t(7431.03) = 2.53, p = 0.012; see Figure 4B). Follow‐up simple‐slope analyses showed that for both cerebellar and vertex TMS, response types were not distinguished in the ERN early in the task (both p ≥ 0.453). However, late in the task, the ERN was more pronounced for errors than correct responses for vertex TMS (β = 1.94, SE = 0.32, t = 6.01, p < 0.001) but not for cerebellar TMS (β = 0.75, SE = 0.34, t = 2.22, p = 0.106).
Additionally, a trend‐level interaction between response type, stimulation site, and TMS timing emerged (β = 1.22, SE = 0.65, t(7448.34) = 1.88, p = 0.060; see Figure S1 for the slope plots). Descriptively, response types were distinguished in the ERN for vertex TMS and pre‐feedback cerebellar TMS but not when stimulating the cerebellum post‐stimulus.
Complete inferential statistics are provided in Table S1. Effects that include the TMS timing factor independent of stimulation site are reported in Supplementary Analysis S3.
Pe—Effects of Response Type (Error/Correct)
3.3
Grand averages for the response‐locked ERPs at Pz for correct responses and errors (i.e., response type) according to stimulation site, TMS timing, and trial number (early, late experiment) are provided in Figure 5A.
*(A) Grand‐average response‐locked ERPs early and late in the task at Pz according to response type (correct, error), stimulation site (cerebellum, vertex) and TMS timing (post‐stimulus, pre‐feedback). Blue lines denote correct responses, red lines errors. Colored bands display standard errors. (B) Slope estimates for Pe amplitude predicted by response type and modulated by stimulation site and TMS timing. Red lines denote cerebellar stimulation and blue lines vertex stimulation. Colored bands indicate 95% confidence intervals. *p < 0.05, **p < 0.001. n error = 2769, n correct = 5122.
The Pe was more pronounced for errors compared to correct responses (β = −0.99, SE = 0.15, t(7848.81) = 6.68, p < 0.001), and late compared to early in the experiment (β = 0.24, SE = 0.07, t(7734.12) = 3.37, p = 0.001).
Importantly, the effect of response type was modulated by stimulation site and TMS timing (β = 2.07, SE = 0.58, t(7855.11) = 3.55, p < 0.001; see Figure 5B). Post hoc simple slope analyses showed that the Pe differentiated errors and correct responses for vertex TMS applied both pre‐feedback (β = −0.76, SE = 0.29, t = 2.63, p = 0.034) and post‐stimulus (β = −0.74, SE = 0.29, t = 2.54, p = 0.044). For cerebellar TMS, Pe amplitudes did not differ between errors and correct responses when TMS was applied pre‐feedback (β = 0.20, SE = 0.29, t = 0.70, p > 0.999). However, a strong response type effect emerged for cerebellar TMS applied post‐stimulus (β = −2.25, SE = 0.30, t = 7.58, p < 0.001), with more positive amplitudes for errors compared to correct responses. To check whether this response type differentiation in the Pe for post‐stimulus TMS was truly stronger for cerebellar compared to vertex TMS, we checked the interaction effect (stimulation site × response type) for post‐stimulus TMS trials via simple slope analysis, which proved to be significant (β = −1.51, SE = 0.41, t(7835.18) = 3.64, p < 0.001). Notably, the interaction effect did not reach significance for pre‐feedback TMS trials (β = 0.55, SE = 0.41, t(7838.69) = 1.34, p = 0.182), indicating that the differences within post‐stimulus TMS were more decisive for the triple interaction.
Complete inferential statistics can be found in Table S2. Effects that include the TMS timing factor independent of stimulation site are reported in Analysis S3.
Control Analysis—Predictability of Pe by ERN
3.4
In an additional analysis we explored whether the effects of spTMS on ERN and Pe were separate effects or whether spTMS only had an effect on ERN which in turn influenced Pe amplitude. The amplitudes of ERN and Pe correlated significantly with each other (r = −0.04, t(7477) = 3.35, p < 0.001), although the correlation strength was very low (Cohen 1988; Evans 1996; Gignac and Szodorai 2016; Funder and Ozer 2019). To check whether the pattern in the Pe is explainable by ERN amplitudes without considering TMS effects, we fitted two additional models: one with the factors response type, trial number, and ERN amplitude (thus disregarding effects of the TMS), and one with the factors response type, trial number, stimulation site, TMS timing, and ERN (thus including both the effects of TMS and ERN). Both models included all interaction terms in the fixed effects. The model including the TMS effects provided a better fit (χ ^2^(16) = 116.7, p < 0.001) and the triple interaction between response type, stimulation site, and TMS timing remained significant even when ERN was included as an additional factor (β = 2.64, SE = 0.65, t(7429.14) = 4.05, p < 0.001). To examine whether, conversely, the ERN amplitude adds information to the analysis of the Pe, we compared the original model to the model with the ERN as an additional factor. The model fit improved when adding the ERN (χ ^2^(16) = 72.60, p > 0.001), indicating that the ERN amplitude does explain variance in the Pe amplitude that cannot be explained solely by the other factors. We did not perform the same analysis with ERN amplitude as dependent and Pe as an independent variable as the Pe occurs after the ERN, preventing effects of the Pe onto the ERN (at least within the same trial).
While we used the objective correctness of the responses as a predictor in these analyses (i.e., response type), subjective perception of which action is better/worse might have differed from this, especially considering that responses were associated with outcomes over time in the experiment, that not all participants learned the contingencies and that errors and correct responses were not as clearly defined as in response‐conflict tasks due to the probabilistic nature of action‐outcome associations. We therefore conducted an additional analysis using a measure that reflects the subjective, relative, instead of objective valuation of the chosen option. We computed the Q diff, that is, the modeled subjective value of the unchosen option subtracted from the value of the chosen option (see Supplementary Analysis S2). This measure thus reflects to what degree the chosen option was perceived as the better/worse option, thereby reflecting intra‐ and interindividual differences in learning and action‐outcome representation (Katahira et al. 2017). Importantly, this analysis yielded a comparable result pattern (see Supplementary Analysis S2).
Discussion
4
In the present study, healthy young adults learnt stimulus–response‐feedback associations while single‐pulse TMS (spTMS) was applied to the cerebellum or a control site (vertex) either post‐stimulus (i.e., pre‐response) or pre‐feedback. Response‐related ERP components (ERN and Pe) were analyzed to investigate whether cerebellar output was necessary for error processing in the forebrain during reinforcement learning. Given that feedback processing during reinforcement learning was compromised in cerebellar dysfunction (Huvermann et al. 2025), we expected aberrant error processing for cerebellar TMS. Results in the current study indicate that this is likely the case: Error‐correct differentiation in the ERN was blunted by cerebellar TMS, while being intact for vertex TMS. Error‐correct differentiation in the Pe, on the other hand, was unexpectedly enhanced for post‐stimulus cerebellar TMS.
Consistent with patterns observed in patients with cerebellar damage/dysfunction in a response conflict task (i.e., reduced error‐correct differentiation in the ERN, Peterburs et al. 2012, 2015; Berlijn et al. 2024a), we found reduced error‐correct differentiation in the ERN under cerebellar spTMS. However, the overall result pattern with unaffected reinforcement learning (Rustemeier et al. 2016; Thoma et al. 2008), reduced error‐correct differentiation in the ERN, and increased error‐correct differentiation in the Pe (Peterburs et al. 2012) resembled results observed in patients with cerebellar stroke. The consistency in results between reinforcement learning and response conflict tasks suggests that the cerebellum is involved in error processing in both task contexts in a similar way, in line with its proposed function in performance monitoring (Peterburs and Desmond 2016). Of note, long‐term compensation and/or functional reorganization in stroke recovery have been proposed to support preserved task performance for these patients in a response conflict task (Peterburs et al. 2012). Such effects were previously not observed in patients with progressive cerebellar degeneration who showed an altered ERN, increased error rates, but unchanged Pe in a response conflict task (Peterburs et al. 2015). For the present study, we had expected that cerebellar spTMS disrupts cerebral processing instantaneously (Romero et al. 2019). Long‐term compensation should therefore not be relevant. Instead, increased error‐correct differentiation in the Pe in the presence of reduced differentiation in the ERN was observed instantaneously, giving rise to questions on the underlying mechanisms.
First, it is debatable whether the observed pattern truly represents a compensatory mechanism, or whether the increased differentiation in the Pe could also be the result of hypermetria. This might be the case in terms of a mismatch in salience which is one parameter that correlates with Pe amplitude (Overbeek et al. 2005). Perceived error salience as measured in the Pe might thus be larger than would be appropriate under cerebellar compared to control spTMS. Dysmetria is a common deficit observed in cerebellar disorders (Manto 2009) and has also been suggested as a deficit in cognitive processes (Schmahmann 1998). Future studies could test this by using different error severities/saliencies in their study. An interpretation in terms of hypermetria would indicate that TMS affected ERN and Pe separately from each other. While this might be the case, an indirect effect of TMS on the Pe via the ERN is also conceivable. An additional control analysis indicated that effects within the Pe amplitude are at least partially explainable by ERN amplitude. An indirect effect of TMS on Pe via ERN would be more consistent with an interpretation of the pattern in the Pe in terms of compensation. However, indirect effects via propagation of the TMS stimulus to further brain areas within the same network, as shown for repetitive TMS (Hussain and Freedberg 2025), could provide a further possibility.
While the ERN is generated mostly by the ACC (Debener et al. 2005; Ridderinkhof et al. 2004; Hester et al. 2005; van Veen and Carter 2002; van Boxtel et al. 2005, but also see Herrmann et al. 2004), neural generators for the Pe are less clear and appear not to be limited to the ACC (Overbeek et al. 2005; Hester et al. 2005). This wider network might have allowed the Pe to be less or differently affected by cerebellar spTMS effects, although more conscious error processing as reflected in the Pe may potentially be more effortful and slower. This unexpectedly increased error coding in the Pe might have compensated for deficits in the ERN, allowing unimpeded behavioral performance. Intact behavioral performance was previously not expected due to the instantaneous disruptive effect of spTMS, which in theory does not allow for long‐term compensation as seen in stroke patients (Peterburs et al. 2012). Conversely, differences in properties of the underlying learning mechanism—potentially caused by the deficits in feedback processing/FRN—might have also resulted in differences in error processing later in the task, resulting in decreased use of systems underlying the ERN and increased reliance on systems underlying the Pe, eventually leading to more Pe‐driven error processing. However, these differences in error processing might not always correspond to intact behavioral performance. Relying on later vs. earlier error processing (i.e., on the Pe instead of the ERN) could be unfavorable in everyday tasks that require swift processing, for example, fast‐paced sequences of responses like in sports or music. It is also possible that this potentially compensatory process is not available in all learning contexts, for example, in more complex tasks. Notably, despite overall preserved learning performance, we did find decreased behavioral flexibility (choice switching; see Huvermann et al. 2025), in line with previous findings (Thoma et al. 2008), which might be related to deficits in the ERN.
Concerning the type of cerebellar output essential for ERN but not Pe, our results do not offer a clear answer. In the present dataset, feedback processing was already shown to be impaired (Huvermann et al. 2025). This might have led to impaired adjustments of prediction, resulting in deficits in error processing at the subsequent response stage. However, the effect of cerebellar TMS on ERN and Pe only occurred for post‐stimulus TMS (trend‐level for ERN), which fits better with a perturbation of information processing directly at the response stage. This will likely include predictive processes, as the ERN relies on this rapid matching of representations of the desired and the actual response based on internal information (i.e., an efference copy). The interaction between response type and stimulation in ERN was found only late during the experiment, which also supports predictive processes, as these predictions can only form throughout the learning process. Previous studies in healthy adults could show that error processing in ERN is stronger after learning, while before learning, feedback processing is more dominant (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008). Perturbed predictive processes would also be consistent with the finding that stimulation timing did not appear to significantly affect feedback processing (Huvermann et al. 2025), as the predictive information is required for updating of predictions at the feedback stage. This mechanism might have affected feedback processing similarly to pre‐feedback TMS. ERN and the FRN (Miltner et al. 1997) are thought to share the ACC as a neural generator (ERN: Dehaene et al. 1994; Miltner et al. 2003; Iannaccone et al. 2015, FRN: Foti et al. 2015; Hauser et al. 2014; Nieuwenhuis et al. 2004), thus potentially being affected in a similar way. Considering the increased error‐correct differentiation in the Pe as a compensatory process, two explanations are possible: The Pe, reflecting more conscious error processing (Wessel 2012; Hester et al. 2005), might either not rely as strongly on cerebellar information, or might also simply be outside the time window of the disruptive effect of the TMS pulse.
Of note, the Pe in our study is not as pronounced as the positive peaks typically found in response conflict paradigms. However, a distinction between errors and correct responses is visible, and a posterior positivity could also be shown in topographical plots of the difference signal for cerebellar post‐stimulus TMS (Figure S3). In two previous studies which examined the Pe in feedback learning paradigms, the Pe peak also seemed to be less prominent in the grand averages (Unger et al. 2012; Zhuang et al. 2021), which might be a characteristic of the Pe in reinforcement learning tasks. This might be due to errors being more ambiguous in feedback learning tasks. However, the Pe is oftentimes not analyzed in feedback learning tasks (Eppinger et al. 2008; Bellebaum and Colosio 2014; Pietschmann et al. 2008; Herbert et al. 2011).
Finally, subjective perception of action values might differ from the objective classification as error/correct. An additional analysis based on action values (Q diff; Katahira et al. 2017) yielded result patterns consistent with the original results for both ERN and Pe, with reduced Q diff differentiation in ERN and increased Q diff differentiation in Pe for cerebellar TMS. This demonstrates that the original findings extend to subjective perception of action value, which might be an interesting measure for future studies.
Limitations
5
We used an active control site (vertex TMS) instead of sham TMS. While vertex is a common control site in cognitive tasks (e.g., Cao et al. 2021; Ciricugno et al. 2020; Kalbe et al. 2010), at least one study (Jung et al. 2016) showed that vertex TMS reduced activity in the ACC, the likely generator of the ERN (Dehaene et al. 1994; Debener et al. 2005; Ridderinkhof et al. 2004). While we used inverted stimulation which showed considerably less and non‐significant ACC deactivation (Jung et al. 2016), and did not find abnormal ERN patterns during vertex stimulation, we cannot rule out that vertex stimulation affected processing. Unlike Jung et al. (2016), we used a more deeply stimulating double cone coil instead of a figure‐of‐eight coil. Feedback‐related ERP components with neural generators within the ACC seemed to be affected by vertex TMS (Huvermann et al. 2025). Unfortunately, there currently seems to be no well‐tested, better suited site for control stimulation. Sham TMS does not seem ideal as it provides a very different experience regarding vibrations, coil clicks, and magnetic field build (Duecker and Sack 2015). Even though we assessed potential side effects of the TMS stimulation and found no significant differences between vertex and cerebellar stimulation (see Figure 3), we cannot exclude that other differences in the experience of stimulation between the two sites that were not captured, such as stimulation of the neck muscles, emerged and contributed to the findings described above. Future studies may want to include several control sites in between‐subject designs.
Moreover, we only stimulated the left cerebellum. Given that a learning task was used, it was not feasible to repeat the task several times to incorporate other stimulation sites, as repetition effects would have predominated. Future studies should investigate the effect of spTMS on other cerebellar regions in feedback learning using between‐subjects designs.
Last, stimulation was applied either 100 ms post‐stimulus or 100 ms pre‐feedback. There is currently no established time window of cerebellar‐brain inhibition in the cognitive domain as available for the motor domain (Ugawa et al. 1991). Given that stimulation was applied 100 ms post‐stimulus, it usually occurred several hundred milliseconds before the response. Berlijn et al. (2024a) varied stimulation timing around the ERN peak in a Go/NoGo flanker task and found that stimulation at or closely after the calculated peak latency, but not shortly before, decreased error‐correct differentiation, showcasing the time sensitivity of cerebello‐cerebral communication in cognition. This might depend on the task at hand, as in the current study, stimulation before responses also led to altered error processing. Future studies need to explore these temporal dynamics in more detail, for example, by implementing continuous manipulation of stimulation timings.
Conclusions
6
The present findings show that cerebellar TMS alters cerebral error processing in reinforcement learning. Error processing was decreased by cerebellar TMS in the ERN and increased in the Pe. This pattern closely resembles altered error processing in cerebellar stroke patients as shown in a previous study in a response conflict task. It remains unclear whether the increased Pe in concert with preserved behavioral performance reflects a compensatory process. Processing was affected more strongly by stimulation closer in time to response execution (i.e., post‐stimulus/pre‐response). Taken together, the present study adds to a growing body of evidence showing that the cerebellum plays an important role in error processing and performance monitoring in general, whereby it directly contributes to reinforcement learning and adaptive control of behavior.
Author Contributions
Dana M. Huvermann: conceptualization, methodology, software, formal analysis, investigation, data curation, writing – original draft, visualization, project administration. Adam M. Berlijn: conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing – review and editing, project administration. Stefan J. Groiss: conceptualization, methodology, resources, writing – review and editing, supervision, project administration. Manfred Mittelstaedt: software, resources, writing – review and editing. Alfons Schnitzler: resources, writing – review and editing. Christian Bellebaum: conceptualization, methodology, validation, resources, writing – review and editing, supervision. Martina Minnerop: conceptualization, writing – review and editing, supervision, funding acquisition. Dagmar Timmann: conceptualization, resources, writing – review and editing, supervision, funding acquisition. Jutta Peterburs: conceptualization, methodology, validation, resources, writing – review and editing, supervision, funding acquisition.
Ethics Statement
The study was conducted in accordance with the ethical principles for medical research involving human subjects outlined in the Declaration of Helsinki and approved by the Ethics Committees at the Faculty of Medicine of Heinrich‐Heine‐University Düsseldorf (2018‐240_1) and the University Hospital Essen (18‐8477‐BO).
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1: psyp70178‐sup‐0001‐Supinfo.docx.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Albrecht, C. , and C. Bellebaum . 2023. “Slip or Fallacy? Effects of Error Severity on Own and Observed Pitch Error Processing in Pianists.” Cognitive, Affective, & Behavioral Neuroscience 23, no. 4: 1076–1094.10.3758/s 13415-023-01097-1PMC 1040067437198385 · doi ↗ · pubmed ↗
- 2Bates, D. , M. Mächler , B. Bolker , and S. Walker . 2015. “Fitting Linear Mixed‐Effects Models Using lme 4.” Journal of Statistical Software 67, no. 1: 1–48.
- 3Bellebaum, C. , and M. Colosio . 2014. “From Feedback‐ to Response‐Based Performance Monitoring in Active and Observational Learning.” Journal of Cognitive Neuroscience 26, no. 9: 2111–2127.24666168 10.1162/jocn_a_00612 · doi ↗ · pubmed ↗
- 4Berlijn, A. M. , D. M. Huvermann , E. Bechler , et al. 2025. “Impaired Reinforcement Learning and Coding of Prediction Errors in Patients With Cerebellar Degeneration ‐ a Study With EEG and Voxel‐Based Morphometry.” Cognitive, Affective, & Behavioral Neuroscience 25: 1126–1146.10.3758/s 13415-025-01303-2PMC 1235673540437311 · doi ↗ · pubmed ↗
- 5Berlijn, A. M. , D. M. Huvermann , S. J. Groiss , et al. 2024 a. “The Effect of Cerebellar TMS on Error Processing: A Combined Single‐Pulse TMS and ERP Study.” Imaging Neuroscience 2: 1–19.
- 6Berlijn, A. M. , D. M. Huvermann , S. Schneider , et al. 2024 b. “The Role of the Human Cerebellum for Learning From and Processing of External Feedback in Non‐Motor Learning: A Systematic Review.” Cerebellum 23: 1532–1551.38379034 10.1007/s 12311-024-01669-y PMC 11269477 · doi ↗ · pubmed ↗
- 7Bolker, B. M. 2015. “Linear and Generalized Linear Mixed Models.” Ecological Statistics: Contemporary Theory and Application 2015: 309–333.
- 8Bostan, A. C. , and P. L. Strick . 2018. “The Basal Ganglia and the Cerebellum: Nodes in an Integrated Network.” Nature Reviews Neuroscience 19, no. 6: 338–350.29643480 10.1038/s 41583-018-0002-7PMC 6503669 · doi ↗ · pubmed ↗
