A Cognition-Affect Integrated Model of Emotion

Sudhakar Mishra; U.S.Tiwary

arXiv:1907.02557·q-bio.NC·May 5, 2020

A Cognition-Affect Integrated Model of Emotion

Sudhakar Mishra, U.S.Tiwary

PDF

Open Access

TL;DR

This paper proposes an integrated model of emotion combining cognition and affect, emphasizing the importance of cortical cognitive functions and brain hierarchies in emotion generation and classification.

Contribution

It introduces a novel cognition-affect integrated model of emotion supported by neural decoding and transfer learning, highlighting the interaction of cognitive functions with core affect.

Findings

01

Core affect alone is insufficient for emotion variety without cognitive integration.

02

Cognition and affect mutually modulate during emotion generation.

03

Brain hierarchies influence emotional responses through hierarchical activities.

Abstract

The focus of the efforts for defining and modelling emotion is broadly shifting from classical definite marker theory to statistically context situated conceptual theory. However, the role of context processing and its interaction with the affect is still not comprehensively explored and modelled. With the help of neural decoding of functional networks, we have decoded cognitive functions for 12 different basic and complex emotion conditions. Using transfer learning in deep neural architecture, we arrived at the conclusion that the core affect is unable to provide varieties of emotions unless coupled with cortical cognitive functions such as autobiographical memory, dmn, self-referential, social, tom and salient event detection. Following our results, in this article, we present a 'cognition-affect integrated model of emotion' which includes many cortical and subcortical regions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research · Emotion and Mood Recognition · Neural dynamics and brain function

Full text

11affiliationtext: Indian Institute of Information Technology, Allahabad, Center for Cognitive Computing, Allahabad, 211012, India**affiliationtext: [email protected]

A Cognition-Affect Integrated Model of Emotion

Sudhakar Mishra

Uma Shanker Tiwary

Abstract

The focus of the efforts for defining and modelling emotion is broadly shifting from classical definite marker theory to statistically context situated conceptual theory. However, the role of context processing and its interaction with the affect is still not comprehensively explored and modelled. With the help of neural decoding of functional networks, we have decoded cognitive functions for 12 different basic and complex emotion conditions. Using transfer learning in deep neural architecture, we arrived at the conclusion that the core affect is unable to provide varieties of emotions unless coupled with cortical cognitive functions such as autobiographical memory, dmn, self-referential, social, tom and salient event detection. Following our results, in this article, we present a ’cognition-affect integrated model of emotion’ which includes many cortical and subcortical regions and their interactions. Our model suggests three testable hypotheses. First, affect and physiological sensations alone are inconsequential in defining or classifying emotions until integrated with the domain-general cognitive systems. Second, cognition and affect modulate each other throughout the generation of meaningful instance which is situated in the current context. And, finally, the structural and temporal hierarchies in the brain’s organization and anatomical projections play an important role in emotion responses in terms of hierarchical activities and their durations. The model, along with the analytical and anatomical support, is presented. The article concludes with the future research questions.

keywords:

Cognition, Affect, MVPA analysis, Functional Networks, Neural decoding, Emotion Model

Introduction

The age-old question ’What is an emotion’ has been investigated since the time of Aristotle. The dominant and popular Classical theory view of emotions given by Darwin [1], followed by Russell[2], Paul Ekman[3] and Jaak Panksepp[4] advocates universality of basic emotions. In parallel, the less recognized Cognitive-Appraisal theory by Lazarus[5], Arnold[6] and others [7] considered the appraisal of context along with many other dimensions. These developments were followed by Social Constructionist theory of emotions[8](Please see box-1) which considered emotions as socially constructed concepts. he development of cognitive-appraisal theory of emotion was followed by Social Constructionist theory of emotions[8](Please see box-1) which considered emotions as socially constructed concepts.

Although the research community has proposed the role of cognition and context perception in emotion, it is not clear what are the cognitive functions which are involved in subjective context processing, and in contextualizing the affect. Also, there is no statistical pattern analysis based evidence which can signify the quantitative contribution of affect and cognitive functions in creating emotions. Finally, the structural underpinning of cognition-affect interaction, which might give rise to emotion function, while modelling and theorizing the emotion, is not considered comprehensively. We attempted to fill these gaps by the series of network analysis on statistically significant multivoxel correlational functional connections, neural decoding, MVPA analysis, and with the anatomical description of cortical processing, which led us to propose a new model called ’cognition-affect integrated model of emotion’. In simplest words, the model can be stated as "the core affect is shaped by socio-cognitive processes into an emotion situated in the dynamically learned context". We give evidence to our model at three levels; first, we demonstrate cognitive domain-general functions and their contributions at the cortico-cortical connection level; second, mesoscopic affect-cognition interaction level; and third, the affect-cognition interaction at the microscopic level of information processing within a context. In the rest of the article, first, the experiment and results are presented followed by the model. Next, the functional modules of the model are discussed detailing the role of various modules in the construction of emotions along with their neural basis. Finally, the structural organization and mechanism supporting the construction of emotion perception are explained, followed by conclusions and future research directions.

hilst part of what we perceive comes through our senses from the object before us, another part (and it may be the larger part) always comes out of our own head [james1890principles].

While emotion was once considered as the property of the limbic system, advances in emotion research support a more multi-faceted account which also includes activity in the neocortex in constructing emotion. Theories of emotion can be categorized in three major classes: classical/basic theory of emotion [4, 9], appraisal theory of emotion[5, 7], and conceptual act theory or constructionist theory of emotion [2, 10, 11, 12].

In general, the classical theory of emotion encapsulates all the theoretical proposals associating emotions with certain bio-markers or with some evolutionarily preserved circuits mediating the emotional feelings. This universality of nature of emotion is handpicked by Paul Ekman [3] who reported cross-cultural similarity of facial expressions. On the other hand, the somatic-marker hypothesis by Damasio, which is originally inspired by the James-Lange theory, hypothesized the somatic biomarkers of emotions (called affect programs) whose activity was associated with the emotion eliciting cues encountered by the organism [9]. Jaak Panksepp proposed seven primary affective circuits which were discovered mostly from animal models and claimed to be preserved in evolutionary process [4]. In short, the classical view of emotions emphasizes characteristics such as emotions are unique mental states, emotions are caused by special mechanisms [9] and specific brain circuits [4], emotions have unique manifestation on face [3], voice and body state [13], unique in responses [3, 13], etc. Basic emotions were assumed to be universal in nature [14, 15] and variations in emotions are considered secondary and tertiary versions of basic core emotions [4]. Emotions have an evolutionary root and share the neural circuits with the non-human animals [4]. This determinism in emotion processing was challenged by the researchers who proposed appraisal theories of emotions.

Appraisal theory was proposed [6] and developed [5, 7] to explain how different emotions may emerge from the same event, in different individuals, and on different occasions. Appraisal theories see emotions as a process (not episodes) which involves cooperation among dimensions [16] including a) appraisal(evaluation of the context and subjective interaction with it to produce values for different variables), b) motivation(action tendencies), c) somatic(bodily sensations), d) motor(expressive and instrumental behaviour), and e) feeling(subjective experience). However, it is uncertain how the process of emotion operates on these dimensions. Another approach to understanding emotion as a process is that they are concepts or categories which are constructed from past experiences and beliefs just like other perceptions.

The social constructionist theory [8, 17, 12] views emotion as a cognitively constructed social reality [18] which is situated in the subjective experience of the context and subjective conceptualization [19, 20, 21]. This view also helped to explain the variations in the emotional experiences within an emotion category, across the subject, context and time, which is a radically different notion from the classical theory of emotion. This theory explicitly maintained the position that emotion is a socio-cultural reality. Emotions are socially and culturally adopted conceptual categories encapsulating varieties of social events and situated emotional feelings and expressions. The conceptual act theory or constructive theory of emotion encapsulates a fair amount of qualitative arguments for the role of cognitive functions in creating emotion but lacks in the quantification of contributions by these cognitive functions, the inclusion of affective processing and qualitative structural and anatomical support.

Recently proposed active inference theory [22] also supports the view that emotion is not different from perception as both are principally framed in the structural and functional asymmetry of cortical organization. The constructionist theory of emotion with hierarchical active inference model of the insular processing [23, 24], led to the theory where emotions are interoceptive inference [22, 17]. These inferences are drawn from the internal representations learned in the past [25, 26, 27]. his theory of emotion suggests the generated internal representation relaying the active interoceptive predictions, which are consciously being perceived as emotions.

n this work, we are proposing a new model of emotion. This framework provides strong support for the claim that cognition and affect interacts in a loop to construct an emotional episode in a given context. Although the result which we have got is from functional connectivity analysis of source localized signals and the decoded cognitive functionality, which are contributing significantly to the emotion categorization, the evidences are extended to include subcortical nuclei and the structural principle of the brain organization[53].

ubcortical nuclei sends feedforward long-range signals which modifies cortical stimulus processing and perception. In this way, the internal physiology and subjective values gets integrated with the ongoing stimulus processing. In turn these nuclei gets feedback projections from cortical areas making synaptic connections and neural assemblies, in general, codifying the social cue as per their characteristics(for example, related to reward, arousal and so on).

Results

We have considered online data for emotions[28]. The EEG and some peripheral physiological (GSR, EMG, Temperature, EOG, Respiration and Plethysmography) data were recorded while subjects were watching multimedia stimuli for 12 different emotions. The available EEG data were pre-processed with EEGLAB script[29]. We have followed Makoto’s pre-processing pipeline to pre-process the signal. Then the network calculation, neural decoding and MVPA analysis were done(for details see methods).

Connectome and functional decoding:

he different voxel-pair wise connections between two regions are merged together and represented as a connection between peak significant voxel for the purpose of visualization and further functional decoding. (See methods for methodological details.)

The connectome presented in figs. 6(f), 6(g), 6(e) and 6(d) (see also supplementary fig S5 for other functions) shows functional connections between voxel pairs which represents two connected regions. The functional connectome is based on the PLV measures on voxel-pairs. These voxels were calculated using source localization technique (for complete details, please see methods). All these voxel pairs are connected due to strong significant functional correlations (for $p<0.0002$ [FDR corrected]). We have calculated set of connections between two regions. These connections have been merged by calculating the probability of significant connections (probability is calculated by dividing favourable significant connections with the total number of connections), to reduce the calculation overhead. This probability constituted the weight of connections between the pair of voxels, which represent two different regions. Further, these set of connections, across all the emotions, were utilized for functional decoding using meta-analysis based on Neurosynth database[30]. Error, cognitive control, and conflict-related voxels coordinates were combined to form saliency network(fig 6(g)). It is based on the literature[31, 32, 33, 34] and overlapping connections shown in table-S4 (supplementary section) which shows that the cognitive control, error, and saliency networks are overlapping relatively more among each other than with the other functions.

Neurosynth database was utilized for decoding terms associated with each voxel pair. Neurosynth provides the meta-analysis of many brain functions by analyzing the vast amount of research articles (approx. 14371 studies)[30]. For each calculated functional connections between voxel pair, the co-activated set of voxels within the 5mm distance cut-off was included. This step was done to compensate the source localization error using sLoreta, which is reported to be approximately 1.45mm( $\pm$ 3.71mm)[35]. The resultant co-activation map was decoded for the associated terms along with the posterior probability of these associated terms(or functions, used here interchangeably) based on the association test (FDR corrected with the expected 0.01). For all the connected voxel pairs, the associated term with posterior probability more than 0.3 were considered. 18 terms with some subcategories and overlaps among each other (see Supplementary table-S4 for overlap and figs. 6(d), 6(e), 6(g), 6(f), 6(h), 6(i), 6(k) and 6(i) for functional connections among and activation of different brain regions) were considered for the MVPA classification using transfer learning and convolution neural network approach(fig 11(b)). For these functions, the meta-analytic co-activation map were extracted from neurosynth database and plotted along with the axial or transverse plane(see statistical map in figs. 6(j), 6(k), 6(i) and 6(h)). This way of calculation, to some extent, reduced the limitation of source localization up to cortical layers and allowed to consider structures from subcortical regions too, which were essential to propose the model. The included subcortical structures in the model were based on meta-analysis of valence, arousal and pain(physiological sensations). These three terms were considered as representing affect and affective sensations. The subcortical structures which are active for valence, arousal and physiological sensation are the amygdala, putamen, cerebellum, caudate nucleus, thalamus, pallidum, vermis and hippocampus (Please see supplementary section ’Active regions for valence, arousal and pain’).

figure[H]

As shown in the connectome(figs. 6(d), 6(e), 6(g), 6(f), 6(h), 6(i), 6(k) and 6(j)), we are getting some regions which are mediating the peripheral connections and working as hubs. These regions include precuneus(7), PCC(31), ACC(32), MidTG(21), IPL (40), PreG(6), STG(22) and LG (18). These hubs are reported in the domain-general systems which mediate multimodal information among brain regions[36], between uni-modal processing regions and sub-cortical structures[37] and thus contribute in cognitive construction[12, 36]. Mostly, we are getting functional connectivity of precuneus with all other parts of the brain. The role of precuneus is implicated in autobiographical memory-related tasks[38], in the tasks related to first-person perspective[39], mental imagery and navigation[40], conscious agency[41] and also in the resting state condition[42]. Precuneus is reported to have widespread connections from several cortical and subcortical regions which includes lateral parietal regions, lateral and medial frontal cortex, temporal pole, temporo-parieto-occipital, thalamus, and brain stem nuclei[43]. To our knowledge, no study has reported the connection of precuneus with sensory cortices directly, but it has strong functional and structural connections with associative regions, which makes it the site for higher-order processing integration.

he other peripheral cortical affective regions which include infFG(47), supTG(38), Ins(13), PosG(3), and mAntC(32) are connected with the hubs(fig 7). It is clear from fig 7(b),7(c), 7(d),7(e),7(f),7(g) and table 7(a)

It is clear from (figs. 7(b), 7(c), 7(d), 7(e), 7(f) and 7(g)) and table 7(a) that physiological or affective sensations only are not carrying enough information to categorize complex emotions. In fact, affective sensation is able to categorize only happy and sentimental emotions for more than 50% accuracy(fig 8). The occurrence of sentimental emotion is more often for female and for male it is happy emotion(see table Connectome and functional decoding:).

lthough, for male positive emotions are dominated more for activity in core affective cortical regions, whereas, for females it is activated for negative emotions.

{longtable}

|p2.0cm|p4.0cm|p4.0cm|

Regions Female Male

\endhead\endfoot

InfFG47 Sentimental, cheerful, joy, lovely, mellow Terrible, happy, love, lovely

STG38 Melancholy, sentimental, terrible, exciting, lovely Sad, happy, love

Ins13 Melancholy, sad, terrible, happy, love, exciting, mellow Cheerful, happy, hate, exciting

PosG3 Hate, sad, exciting, happy, mellow, cheerful

mAntC32 Sentimental, cheerful, joy, lovely, mellow Hate, terrible, cheerful, exciting, love, mellow

Core affect related cortical regions with activity for some emotions (for male and female): The table shows cortical regions which are related to affect. Due to some emotion stimuli, activity in these regions for male and female cases is observed. For males mostly positive and for females mostly negative emotions are causing activity in affective cortical regions.

Deep Neural Network based MVPA analysis using transfer learning:

n the constructive mechanism framework, these modules together are create internal model representation. The internal representations are an approximation of causal structure of the world which will be making sensory predictions about input, to sensory organs, from the body and from the external world in future. what kind of input sensory organs are going to sense from the body and from the external world.

longtable|p9.0cm|p1.5cm|p1.5cm| Category no voxels accuracy

\endhead\endfoot

Voxels from pairwise connections after FDR correction 1230 79.29

Attention Network 281 48.44

Auditory 546 23.88

Autobiographical Memory 767 60.94

Cognitive Control 180 44.42

Comprehension 624 40.85

Default Network 742 66.52

Error 79 35.71

Language 577 37.50

Motor 641 26.12

Motor Imagery 857 30.36

Physiological sensation (valence/pain/arousal/somato-sensory) 570/752 26 to 31

Salience Network 197 63.17

Self-referential 450 56.47

Social 754 51.79

Theory of Mind 729 58.04

Visual 624 23

Working Memory 602 45.76

Randomly selected voxels 1230 29

MVPA analysis using deep learning: Using deep learning(for architecture see fig 11(b)), we classified emotions on 12 unique categories for 18 functions. We have extracted these functions by neural decoding on voxel pairwise functional connections which have been calculated using PLV and survived permutation test. In total, 615 voxel pairs(1230 voxels) for 14 emotion conditions are considered. Since we had in total only 1230 voxels making functional pairs with each other, the share of voxels to each cognitive functionality was very less. And, with these less number of voxels, we can’t train the model. Hence, we trained the model using transfer learning, as described in the method section, with 1230 nodes in the input layer to match with the original set of voxel nodes in the functional connectome results. So, we took the help of neuroSynth database and found out the voxels which were consistently active for large cognitive neuroscience database related to the decoded function. In such a way, for all the 19 decoded cognitive functions, the consistently active voxels are extracted. The trained model has been tested for the set of voxels related to these individual functions. Out of 19 different functions, we are getting comparable accuracy for autobiographical memory, default mode network, salience network, theory of mind and self-referential. To cross-check that the set of voxels, which we have calculated and which are associated with the functions mentioned in the table, are not random, any set of random voxels have been picked which gave us only 29% accuracy.

n this way, the trained machine was quite robust on detecting general features of electrophysiological data.

S-1 is different from SS-2 in the sense that SS-1 was used to find out the significant functional connections in our source localized voxel pairs whereas SS-2 is pointing towards the consistent activity of the regions across considered studies in the neuroSynth database meta-analysis. For SS-1 and SS-2 please see methods.

e had signals for 6239 voxels and after calculation of statistical significance of phase synchronized connections among voxels (with FDR corrections) we got connections among 1230 voxels. The rest of the voxels out of 6239 were used to perform generalized learning. Out of the rest of voxels, 200 sets of 1230 voxels (input size of one set is 448x1230x99) is created to train the generalize structure of the input to the deep model. This trained model is used to find out the classification accuracy with different inputs (in terms of different set of voxels). When the model is given set of voxels (input size: 448x1230x99)making significant synchronized connection and survived FDR correction, the achieved accuracy is 79.29%. On the contrary, with the created random set (with the same input size as the calculated set), the accuracy was merely by chance. We used the same pre-trained model to probe the influence of 19 brain systems in emotion discrimination. The achieved results are supporting the hypothesis that emotions are constructive in nature and involves many cognitive processes. This statement is based on the fact that the brain system which gave above chance accuracy in emotion discrimination task were autobiographical memory(60.94%), default mode network(66.52%), salience network(63.17%), self-referential(56.47%), theory of mind(58.04%) and social(51.79%).

e have extracted these functions by neural decoding on voxel pairwise functional connections which have been calculated using PLV and survived permutation test. In total, 615 voxel pairs(1230 voxels) for 14 emotion conditions were considered. Since we had in total only 1230 voxels making functional connections with each other, the share of voxels to each cognitive functionality were very less. And, with these less number of voxels, we can’t train the model. Hence, we trained the model using transfer learning, as described in method section, with 1230 nodes in the input layer 11(b) to match with the original set of voxel nodes in the functional connectome results(for all voxel pairs see table-S2 in supplementary section). So, we took the help of neuroSynth database and found out the voxels which were consistently active for large cognitive neuroscience database related to the decoded function. In such a way, for all the 19 decoded cognitive functions, the consistently active voxels are extracted. The trained model has been tested for the set of voxels related to these individual functions.

Our network analysis and functional decoding shows that the main involved functional components include: the salience network(SN), default mode network(DMN), autobiographical memory(AM), self-referential, working memory(WM) & attention, and social or theory of mind(ToM). These cognitive functions are making significant contributions in distinguishing one emotion from another as shown in the table 7(a). Evidence for these sub-modules is supported by our statistical significance analysis for functional connections as well as deep learning-based emotion classification (for above mentioned modules individually).

The 18 decoded functions (table in fig 7(a)) were considered for the MVPA classification using transfer learning and convolution neural network approach( figs. 11(b) and 11(c)). We utilized transfer learning[44, 45] for training the deep learning model since we had limited data. To train the model( figs. 11(b) and 11(c)), we utilized four different kinds of data from the physionet databank. First, EEG data on mental arithmetic task[46], second, EEG data for motor movement and imagery[47], third, ERP based BCI recording on target and non-target set of characters[48], and fourth, MAMEM Steady-State Visually Evoked Potential(SSVEP) EEG database[49]. All these datasets had different categories to be classified. Our main intent behind taking all these different EEG datasets is to let the model get familiarized and set its parameters for the general characteristics of electrophysiological signals by learning filter kernels and transfer this learning to a new task. The concept of transfer learning resembles the human learning in a way that humans see many examples on moment to moment basis(for example, images) and get trained for separating two objects based on their general features only(for example, edges, corners, blobs, textures and so on) even without knowing specific name of these objects[50]. Before training the model, EEG channels were source localized to find out source voxel activity using sLORETA. Activity in 6239 voxels was calculated using source localization method. One of the limitations of transfer learning is that the input layer should have the same number of nodes. For the final testing, we had only 1230 voxels(these 1230 voxels are as per our functional connection analysis), and that’s why we made chunks of 1230 voxels during training so that we have the same number of nodes in the input layer. The selection of these voxels, during training, was done randomly. In this way, we created 200 batches of training input for each of the considered datasets (the batch mentioned here should not be confused with deep learning batch size which is normally used during training to deal with the computational and convergence time). That means, a total of 800 batches of training input with varying number of samples. Machine trained on one batch was used in the next batch and so on. In this way, the general-purpose EEG model, which we trained, learned very general features and structures of EEG. The trained model was then fine-tuned with the set of 1230 voxels randomly picked from our source localized data with 6239 voxels (although for these random set of voxels we had very less accuracy, we succeeded in tuning the model for the brain signals for emotion stimulation.). The fine-tuned model was applied in a testing set of data with the limited number of samples from the emotion experiment. ince the model has already learned the general structures, to learn the new task very large amount of new samples were not needed. In conclusion, transfer learning technique used here to deal with the classification challenge on the limited dataset with different output. The details about feature calculation, input size, s/w and h/w specifications are discussed in the methods section.

Based on our results, we claim that the higher-order functions like salience detection, autobiographical memory(including self-referential), theory of mind/social and default mode network are crucial for providing meaning and thus classifying the affect in the frame of context and making the emotion a reality. Affect or physiological sensation could not classify 12 different emotions (namely happy, fun, exciting, love, lovely, mellow, hate, melancholy, sad, sentimental, shock , and terrible) with more than a chance accuracy of 31%. Although, they might be contributing in salient feature selection through long-range projections (see fig 6(e) & 10(a)). The presented computational analysis provides strong support for the notion that emotional experiences are significantly contributed by meaning-making domain-general cognitive functions. The association of affect with the context can be either conscious or at the subliminal level. This association could be implicit with the contribution of the physiological neural map during very early sensory processing (for example, projections of the physiological neural map onto sensory cortices[51]). And, it could also be delayed with the delayed interpretation of the complex social situation (for example, model of the mind, in which case the context interpretation will be more explicit[52]).

oth table Deep Neural Network based MVPA analysis using transfer learning: and figure 8 are in agreement with each other in a way that the higher number of true positive cases is observed for the default mode network(DMN) and the lowest (among the presented confusion matrix) is for attention network.

t both conscious and subconscious level, memory plays an essential role as anatomical projections are restructured over time[deng2010new] accommodating the learning of the context in neural assemblies11. The learning is used to anticipate the future responses. This mechanism gives advantage to species in two ways. First, metabolically stabilization by apriori allostatic regulation(the destabilization can occur due to the uncertainty of the environment) and second, the advantage in more precise response before any malicious event can occur.

figure[H]

Discussion

Earlier model of emotion were more inclined towards classical and deterministic aspects of it [4, 3, 13]. With the continuous development in understanding [5, 7, 2, 10, 11, 12], later proposed models started modeling emotion as a non-deterministic and distributed phenomena. Following the same line of development, we have proposed here a conceptual model of emotion supported by the analytical observations of long-range cortico-cortical and cortico-subcortical coactivations, calculated cognitive functions, and description in the framework of brain’s organization (functional asymmetry, the laminar organization, microscopic descriptions). As the laminar organization [53, 54] and functional asymmetry [55] are reported to be the general structural and functional organization of the brain, we considered this concept for the processing of emotions too.

The presented layered model (see fig 9) signifies the interaction between affect and cognition in loop to construct an event of emotion. Based on our results and the model, we suggest that communications among different brain regions, which are responsible for social context and self related event processing (for example places, objects, goals and so on), salient feature detection, attention, reward/punishment, hedonic value, and physiological sensations (all discussed separately in the next section), takes place to create an event of emotion.

Our model explicitly explains the nature of emotion against the universality of it in a way that emotion itself is part of the process underlying on the brain’s dynamic connectivity organization. And, these dynamic interactions construct an affective subjective experience which is called an emotion. Our model also argues beyond the concept of appraisal model in a way that emotion is not merely reaction to the appraised stimulus but encoded in experience. Our model is inferred from calculated cognitive functions (using neural decoding and MVPA analysis) rather than speculative arguments on the involvement of different cognitive functions unlike in social constructionist model of emotions.

Using MVPA analysis, we found that affective sensation plays a little role in categorizing emotions and they have to be contextualized with the domain-general and socio-cultural processing(table in fig 7(a) & fig 7) to categorize emotions. Emotions have no distinct and defined clear boundaries based on physiology, expressions or anatomy but fuzzy and statistical in nature and they are learned. Rather than characterized with specific biomarkers, emotions can be categorized based on MVPA analysis with statistical representations which varies within(less) and among(more) different categories of emotions. These statistical representations are learned contexts for an emotion. These learned representations encode various levels of detailed and abstract interoceptive and exteroceptive information [56] of subjective importance across the subcortex and cortex with behavioural goals during learning (learning due to sufficient supervised or unsupervised encounters) all the way up to highly generalized, amodal[57] and abstract concepts[58]. When any cue (internally predicted and/or externally presented), which is informative enough to activate the onset of an episodic event, is encountered, the sequence of events in time[59, 60] within the delineated spatial boundary[61] of environmental context (altogether creating an integrated state of emotional episode [58]) is recalled sequentially in order. These contextual cues are encoded in the chunk of neural assemblies[62] and maintained in the cortical hierarchy[58]. These sequence of events might be fulfilling different subjective behavioural goals which are the result of the con-specific social arrangement for better human survival in a culturally created environment. This dynamic, culturally created environment causes the development of cognition throughout life (due to the life long interaction of an individual with the uncertainty in the environment)[63]. This ongoing learning of socio-cognitive perception shapes affects into the spectrum of emotions. With cognitive learning, affects find their socio-culturally defined meaning, representations and expressions. In this way, the core affect is nurtured over its innate nature to maintain the allostatic stability of the organism. The culturally agreed and nurtured affective representation is remembered for foraging the contextually situated sequence of events in the future when a similar context is encountered.

Follow-up text describes different submodules contributing significantly in constructing an emotion. It includes salience network, social/ToM, autobiographical memory/self-referential/dmn, affect and interoception. We also discuss the integration of information within and between subcortical and cortical systems along with the consideration of the structural layout and anatomical projections. Due to lack of space some other details regarding the below discussed functions, interactions and anatomical projections are included in the supplementary section ’Decoded cognitive functions from the functional connections’.

elated modulatory signals from amygdala and other subcortical structures.

figure[H]

**The cognition-affect integrated model of emotion: ** The four-layer structure depicts functional specificity ranging between the core affect specific subcortical nuclei and context processing specific cortical regions. The neocortical regions are connected via long-range connections(figs. 6(f), 6(d), 6(g) and 6(e) and supplementary Fig S5) and interact with the prelimbic and limbic memory system [179, 180, 181] to process the context. On the other hand, subcortical nuclei react to reward, arousal, pleasure and physiological sensations. These two systems interact with each other in a loop via short-range and long-range projections. This learnt cortico-cortical and cortico-subcortical interaction loop at the local (fig 11) and global scale (fig 7), constructs an integrated state called an emotion. The affect and context can be dissociated at various levels of hierarchical interaction, and information processing (fig 10) as the context has a structural and temporal hierarchy. Following the hierarchical structure arrangement, there are moments where affect is dominating. In this case, the contextual consideration is minimal and primary, which result in the quick and implicit response (maybe very much related to the survival reflexes). On the other hand, with the varying degree of considerations of this hierarchical structure of the context, varying degree of emotional responses can take place. Since emotion is a learnt concept and associated with the well being, it can be stored for the future reference and recalled as anticipation in order to achieve the allostatic gain. The connections among cortical regions are as per our results and depicted in square brackets. These connections are for the set of cognitive functions including default mode network and salience network, social/ToM, autobiographical memory, physiological sensations, or affect). The subcortical regions were calculated using neurosynth meta-analysis for the affect related functions(valence, arousal, physiological sensation/pain) which we have obtained in our cortical functional network decoding. The interaction between cortex and subcortex, in emotion/affect/sensation condition, is reported in many studies [151, 182, 183, 184, 185, 186, 187, 188, 115, 122, 119, 189, 190, 191, 192, 142]. Abbreviations: NA:Nucleus accummbens ; LC: Locus coeruleus; sg:sub-geneual ; ACC:anterior cingulate cortex ; IPL: Inferior parietal lobule; mPFC: medial pre-frontal cortex; SMA: supplementary motor area; STG: superior temporal gyrus; TPJ: temporo-parietal junction; PIC: posterior insular cortex; MIC: middle insular cortex; IFG: inferior frontal gyrus; MCC: medial cingulate cortex; pg: pregenual; RSC: retrosplenial cortex; PCC: posterior cingulate cortex, MTL: medial temporal lobe. Sg, Ig1, Ig2, and A are representing granularity and described in fig 9(a).

*Decoded cognitive functions from the functional connections The cognition and affect related functions presented in fig 8 contribute with varying degree in categorizing emotions. Among them some functions contribute significantly in constructing and situating the emotion in a context. Due to lack of space some other details regarding the below discussed functions, interactions and anatomical projections are included in the supplementary section ’Decoded cognitive functions from the functional connections’.

figure[H]

Cortical structural and temporal functional organization underlying feedforward and feedback information integration in the cognition-affect integrated model of emotion. (a)Mesoscale cortical representation of active regions with granular structure: Hierarchical agranular, increasing granular to fully expressed granular structures and anatomical projections adapted from[53, 54] and the depicted cortical regions are from our results; (b) Interlayer Communication: The diagram depicts interlayer communication among granular and slightly granular layers. The interlayer communication is between nearby layers and remote layers[55]; The projections of feedback pathways diffuse among layer-1, 2/3 and 5 for nearby regions which progressively corner towards layer-1 and layer-6 for remote regions to modulate the centre-surround pattern and converge the cortical sub-cortical loops. On the contrary, the projections of feedforward pathways are diffused among layer-1, 2/3 and 6 for nearby regions which progressively projects to interior layers-2/3/4, and 5 for remote regions to perform the stimulus-driven activity. (c) **Structural and Temporal Hierarchy of Cortical Processing: ** Structural hierarchy is mostly the result of degree of presence of granular layer in the region whereas temporal hierarchy is associated with the invariance property with higher-level cortical regions are showing higher invariance (less dynamic) in comparison to lower level which encodes more dynamic, less invariant and implicit representations. Adapted with permission from[193](Permission from nature neuroscience to reproduce the figure from[193]). The structural and temporal hierarchical representation makes the bases of hierarchical context representation with more time-variant details in granular layers and less time-variant details in the agranular layers.

Social Processing and Theory of Mind:

Evaluation of social context is distributed among regions which are responsible for the spatial, object, face, and temporal sequence processing. These constituting elements of the environment work as a cue to retrieve the concept which can be positive or negative based on the associated reward and social gain. Reward, attention, and physiological affective sensations modulate the recurrent circuits (fig 10(b)) to facilitate salient feature detection at the different levels of representation hierarchy.(fig 9(c)). Social and emotional behaviour are intertwined, which can be broken in the series of processing constructs[64]. Recognition of intention(ToM) and acquisition of social-emotional value leads to the modulation of low-level affective and physiological activity. The low-level activity, in turn, causes changes in higher-order cognitive processing and representations, which leads to conceptual inference about emotion. For example, decoding other’s intentions as harmful may cause intense physiological activity and release of stress hormones. This affective modulation influences the cortical processing[65] and representations which amount to infer the concept of fear emotion in the current situation. So, the emotion itself is a subjective meaning projected in the social context that is contributed by the subjective, physiological saliency. The context is being provided by large-scale brain networks. These contextual representations are acquired in the service of allostasis[66] regulation to meet the demand of the situation in the socio-cultural environment. For example, a fight or flight response in the above situation.

hey are learned and anticipated regulatory mechanisms regulating the internal milieu[krusemark2013sense, shin2009expanded, hoemann2017mixed], perception[barbas2011sensory] and memory[zheng2017amygdala] in order to gain social advantage[somerville2006anterior, schulkin2011social, wilson2017constructing].

Emotions are also learned as a concept and social phenomena which are to extract the optimal benefit of interest to self and/or others. The emotion learning gets matured with the development of long-range connections and dendritic arborization[67] (fig 11). An increasing number of dendritic spines supporting long-range and short-range feedback projections[68, 69, 70, 71, 72, 73, 51, 74, 75, 76, 77, 78] witness these optimal representations. These feedback (top-down) projections (fig 9(b) & 11)modulate the intermediate processing[55], sensory processing[68, 76] and actions[68] as per the internal representations of the exact or similar situations[79] which are experienced in the past. Previous studies has reported activity in temporal pole(38), IPL40, IFG47, CinG32 in affective ToM; insula in integration of cognition and affect(stimulus valence)[80, 81]. Activity in these regions is found in our results(fig 6(f))[The regions with the significant activity based on neurosynth meta-analysis database is included in supplementary section table S6 & S7]. Other than these regions, we have also observed activity in the precuneus, which is not explored and attended seriously. We suggest that precuneus plays an important role in cognition-affect interaction. Since precuneus is also designated as a hub and communicate with other associative cortical regions, it might be orchestrating the cognition-affect interaction.

eural bases, functional connectome and activation map for social processing is shown in Engagement withLABEL:golland2017neural and perception ofLABEL:redcay2019using social context causes the affective modulation by core-affect regions. In the anatomical labeling of social and affective functions, we observed the overlapping regions which hints toward social modulation of affect. (Mention the region here.)

Working Memory:

Working memory is the ability to maintain the short-term neural activity[lee2016multi] in mind and manipulate it mentally to mediate the meaning-making process. It is the process memory which is active at the time of online processing of context[90]. Working memory is distributed across maximum regions of the neocortex, if not all[christophel2017distributed](as shown in fig LABEL:fig:connect_wm). Working memory representations range from low-level visual features such as orientation, color, motion or visual complex patterns [harrison2009decoding, serences2009stimulus, christophel2012decoding, riggall2012relationship, pratte2014spatial] and audio features from primary auditory cortex[linke2011stimulus, kumar2016brain] to complex visual and phonological patterns in parietal areas to abstract representations in frontal cortex[ester2015parietal, kumar2016brain, jerde2012prioritized, lee2013goal, spitzer2010oscillatory, spitzer2011stimulus, spitzer2012supramodal, christophel2013decoding]. The inactive form of this memory is the long-term memory which is there in the form of the neural ensembles to be activated with the appropriate cues(fig 10(b)). Inactive LTMs, by contrast, are simply the long-lasting structural features of a circuit that are not currently affecting the processing of incoming information in that circuit. The process-memory framework of working memory is concerned with the active memory that is intrinsic to the emotion perception related processing which is taking place within a circuit. Different type of information, being processed, activate different type of process memory in order to integrate the context from past experiences with the current sensory information. The distributed nature and information specific activation of neural ensembles also supports the massive parallelism, local and fast availability of information from the past experience. The gain control of these past experiences given the incoming information cue is mediated by top-down projections including attention, motivation, intrinsic and extrinsic goals, interoceptive intensity and valence.

Working memory can also operate on non-conscious information[soto2011working, rosenthal2010visuospatial, dutta2014neural, sklar2012reading, hassin2009implicit] contrary to the belief that it can only be allocated to conscious information. Researches on subliminal processing have provided intriguing evidence that reading, doing arithmetic, complex visuospatial learning, and working memory operations may occur independently of conscious awareness of the critical information[soto2011working, rosenthal2010visuospatial, dutta2014neural, sklar2012reading] suggesting that, at least under certain conditions, non-conscious information can be committed to the working memory systems. These recent investigations beg the question of how the brain can undergo computations using non-conscious information in the service of high-level cognition. [soto2011working] demonstrated that even when attention resources are constrained by distracters, working memory may operate in a rather autonomous fashion independently of both conscious awareness and attention. Using Neuroimaging and neurostimulation technique [dutta2014neural] have evidenced that the dorsolateral and anterior prefrontal cortex can operate on non-conscious information in a manner that goes beyond automatic forms of sensorimotor priming and which may support implicit working memory processes and higher-level cognitive functions.

Autobiographical or episodic memory and concept cells:

The episodic event and concept are acquired with its physical and temporal structure in the environment. Activation of an internal representation of specific context can activate neural patterns down the path and modulate regions related to core affect eg.basolateral amygdala (BLA) and central amygdala (CeA)[82]. Using anatomical labelling of regions, which are calculated as consistently active (in neurosynth meta-analysis) for autobiographical memory(AM) and core-affected regions, we observed that some regions including bi-GRe, bi-Hipp, bi-PIns, bi-PHG, rTMP and bi-MTG are overlapping for these two functionalities(abbr. are in supplementary table S7). We also observed overlapping in cortical functional connections in our functional connectivity analysis which includes Pre7, Pre31, ACC32, IPL40, MTG21 and CinG31. We considered dmn, AM and self-referential as one group here since in our results we are getting maximum overlapping (see table-S4 in the supplementary section) as well as they are reported to be closely related in the literature[83].

As shown in the model(see fig 9), MTL system is comprised of several regions including the hippocampus, entorhinal cortex, perirhinal cortex and parahippocampal cortex. The parahippocampal and perirhinal cortices receive direct inputs from cortical sensory areas and send this information to the entorhinal cortex, which, in turn, projects to the hippocampus(top of the MTL hierarchical structure). The hierarchy of processing in different regions of MTL is presented in terms of selectivity, response latency, response units, and modality-specific invariance and encoding. Indeed, there is an increase in selectivity of neurons across the MTL, with the lowest selectivity found in the parahippocampal cortex and the highest in the hippocampus[84, 85]. Communication between the MTL system and sensory regions activate older memories for the perception of new events. The abstract memory representations, in terms of concept, in the MTL system and local hierarchically ordered complex feature representations in the cortical regions, interact to construct the whole brain event. o, the hierarchy is the fundamental principle of the structural and functional organization of cortical processing encapsulating the hidden hierarchical structure of the physical information and object in the real world[79].

Space, time and contextual details are represented in hierarchical manner[58] in terms of different degrees (general and abstract to specific and implicit details) of spatial[62, 86], temporal[62, 86] and contextual details[86], respectively. The higher level allocentric representation of space as an internal spatial-map, information related to temporal gap between the sequence of events[60, 59], and integrated concept representations of mnemonic items (such as people, objects, and landmarks)[87] are mostly reported to be encoded in place and grid cells, time(encoded in the order of activation) cells and concept cells, respectively, in the MTL system[88]. It is evident with the place cells[61, 89], grid cells[61] and concept cells that abstract or coarse level information is represented in exclusive manner and combines many low-level representations or events[88] (for example, low-level affective and sensory details). To represent medium-level(entities in the scenes) and low-level(entities specific details), the memory network extends to interconnected cortical regions such as EC, perirhinal, mPFC, PPC, mPC, lateral temporal cortex, insula and so on (as shown in figs. 6(d), 6(h), 6(l) and 9). This memory network encodes domain or function-specific memories which can range over the specific to abstract scale(for example, unimodal to amodal). Other than the structure, an alternative way of understanding this hierarchy is what is the duration of change or frequency of change in the neural activity in a particular region(fig 9(c)). For example, in the case of auditory modality, due to change in phonemes and letter the neural dynamics in the auditory cortex get changed. Whereas, cortical columns in pSTG or TPJ may respond to a word in which temporal frequency of change is less than the frequency of change in the auditory cortex and so on[90]. The temporal hierarchy is also responsible for variation in emotion response considering the implicit to abstract level processing in the hierarchy. The encoded experience in the spatial and temporal hierarchy is reinstated to construct a predicted event(e.g. emotional event) which is anticipating the representation and sensations in the sensory organs[91].

Salience processing and attention:

Salience processing can be influenced by minimally conscious physiological states, goals and drives(bottom-up), on the one hand, and conscious effect of previous experiences, goals and memories(top-down), on the other. In the former condition, visceral and autonomic activity, and physiological homeostasis condition can influence what is perceived to be salient. In the latter condition, conscious salience processing can be goal-directed and dependent on top-down attention and cognitive control processes. In the goal-directed salience processing, salience network also includes ACC and PCC/Precuneus(see fig 6(g)), which are the reported sites for attention[92]. The functional coactivation pattern between salience network, pain, valence and arousal includes overlapping region such as bi-ACgG, bi-AIns, bi-MFC, bi-MTG, bi-PIns, rPut, and lPHG as shown in the fig 6(k)(abbr. are in supplementary table S7).

The salience network (which also includes interoceptive information) plays a vital role in emotion, cognition, and perception. The thalamic nucleus is a sensory relay station which receives lamina-1 axons and projects majorly to the primary interoceptive cortex and minorly to the somatosensory cortex. The interoceptive signals are further received by decreasing granular sites (mid-insula and anterior insula in less differentiated structural order, respectively) where integration with other input modalities takes place[93]. The salience network involves anterior insula since at anterior insula different input modalities integrate[94, 95] and projects output signal about the subjective physiological significance to other parts of the cortex[96]. The anterior insula also has the mirror neurons, which can mimic the structure and motion of the physical world and embodies it in the physiological processing. Salient event is distinguished from non-salient events in a way that it has significance for subjective well-being[92, 94, 96, 97]. The salient information about subjective well-being and allostasis stability plays a decisive role in attention, cognitive control and error processing (table in fig 7(a) & supplementary table-S4). Subsequently, these systems via long-range connections, modulate the functional microcircuits in other brain regions and contribute in feature selectivity(fig 10(a))[98, 76] as per the top-down and bottom-up saliency.

Affect, Physiological State of the Body and Affect-Interoception-Context Interaction:

There is plenty of evidence that the subcortical input (thalamoamygdala[99], thalamocortical[100]) provides preliminary input to create coarse information based mental representations encoding the expectation of a series of actions. Subcortical nuclei, like amygdala and pulvinar, directly interact with the dorsal stream and frontoparietal attention network. Amygdala also modulates the response of ventral visual stream, orbitofrontal and anterior cingulate cortex. Other than the direct connections, amygdala, accumbens, pulvinar, and superior coliculus modulate the cortex by modulating brain stem nuclei[101, 102, 103]. Tracing of the temporal structure of acoustic events reveals the role of thalamocortical connections in the automatic encoding of event-based temporal structure with high temporal precision, whereas the striato-thalamocortical connections engage in the attention-dependent evaluation of longer-range intervals[100]. Sub-cortical and lower-level perception of emotional stimuli is reported when given the audio-visual stimulation[104, 105] and these subcortical coding of relevant changes in audio-visual signals as event markers may be facilitating the activity in the cortical hierarchy. For example, for the visual modality in blindsight patients, activity in the amygdala, pulvinar, and superior colliculi takes place[104]. The early-stage activity in amygdala and hippocampus causes modulation of later stage cortical activities (for example, feedback modulation to sensory processing).

Different social context [106], with the same amygdala neural ensemble [107], can cause different emotions. On the other hand, the same context with different amygdala activation might evoke the different intensity of the same categorical emotion as the different intensity of stimulus can evoke different neuronal ensemble in the amygdala [107]. For example, basolateral amygdala(BLA) mediates associative learning for both fear and reward[108, 109, 110, 111]. Different BLA projections to the nucleus accumbens(NAc), medial aspect of the central amygdala and ventral hippocampus [112] distinctly alter motivated behaviour. In the functional microcircuits(fig 10(b)), some neurons get excited and some get inhibited composing neuronal ensemble for valence (positive or negative) conditioned stimulus [112]. Within the BLA, the neural responses to cues that predict rewarding(BLA-NAc projecting neural population and BLA-vHPC) and aversive(BLA-CeA projecting neural population and BLA-vHPC) outcomes differ depending on the anatomical projection target of each neuronal sub-population [113].

A given neural module may not be permanently dedicated to just one affective function, but it may have multiple affective modes as states or conditions change[114]. These affective states/conditions facilitate affective modules with different affective valence-related functions. The affective modules(for example, approach, avoidance, reward, and punishment) are the functions of the neuro-biological patterns which are represented in the brain as the global context, internal physiological state of the body and the pattern of the electrical excitation in terms of frequency and time within the module. These patterns influence the affective mode of the module. Affective valence is a generated response which depends on the situations and the interplay between subcortical and cortical processing structures[114]. Most sites within the nucleus accumbens medial shell and amygdala are not permanently tuned to one affective valence function but rather have multiple modes that can dynamically flip to generate motivation for opposite affect as the conditions change[115, 114]. Central amygdala (CeA) output signal can give rise to very different behavioural responses depending on the functional states of other brain areas reflecting external and internal factors, such as context, anxiety, hunger or thirst[115]. Also, neuromodulators enable cortical circuits to process specific stimuli differentially and modify synaptic strengths in order to maintain short- or long-term memory traces of significant perceptual events and behavioural episodes. One of the major subcortical neuromodulatory systems for attention and arousal is the noradrenergic Locus Coeruleus(LC-NA)[116]. Activity in the LC-GABA neurons controls the arousal by either activating or inhibiting LC-NA neurons[117]. By preferentially targeting LC-GABA neurons, non-coincident inputs set thresholds for NA activation and enable modulation of tonic LC activity during different contexts[117]. The activity in LC-NA neurons in turn regulate attention[118, 116, 119], feature selectivity[118, 119] and salience processing across the cortical[120] and thalamic nuclei[121, 119]. LC gets input from regions including CeA, PFC, hypothalamus, vagus nerve[118] and projects to hippocampus, cingulate cortex, sensory cortices, somatosensory, cerebellum[122], thalamus[121, 119], amygdala[118] and prefrontal cortex[118](fig 9). The contextual modulation of LC is received via prefrontal regions and tunes the LC activity as per the environmental and cognitive contexts[118]. For instance, LC response to a distractor, an unexpected event, is attenuated when the subject is focused on the task at hand, but the LC response to an awaited task-relevant cue is enhanced.

We have also observed the functional activity in the precuneus (BA7) in every calculated cognitive and affective functions. Precuneus as a cortical hub, has connections with many other cortical and subcortical parts of the brain[43]. Acquired autobiographical memories about events involve complex, multimodal and affectively salient memories embedded in a rich context of personal, social and environmental information. During regeneration, different level of spatio-temporal information is remapped. Based on our results, we suggest that precuneus is mediating connections among associative cortical and subcortical brain regions and playing an essential role in cognition-affect interaction. Although the functionality of precuneus has been explored, for example, in self-related processing, memory-related processing, attention, navigation and other tasks[43], it demands more serious research attention. We emphasize its importance as it’s widespread functional connections make it an integrating hub and elucidate its importance in emotion emergence.

Cognition-Affect Interaction: The Anatomical Layout According to the Principles of Brain Organization

In the framework of statistical and hierarchical representation, the internally generated context is just a probabilistic activation of distributed pattern [123] throughout the cortex which is encoding the hidden causes of sensory experience or consequences [124]. The causal pattern of activation is embedded in the structural and functional asymmetry[24] in terms of the laminar projections and spatio-temporal hierarchy[125]. The hierarchical structure of information, which is distributed as a pattern, encapsulates abstract to event-specific sensory and emotional experience.

*Hierarchical structure and feedforward-feedback projections: *

In the fig 9(a) the sketch of the brain is depicted with the colour-coded rectangles illustrating the cellular organization of the regions[53] (these regions we found in our study). Agranular(see right side of fig 10(b)) and slightly granular layer is lacking and has negligibly developed granular layer 4, respectively. The granular layer 4 (left side of fig 10(b)) contains fine granule cells which are located in the primary sensory cortices. High-frequency burst activity in granule induces short-lived facilitation to ensure signalling within the first few spikes, which is rapidly followed by a reduction in the neurotransmitter release[126]. The fast rate coding of granule cells may be facilitating the sensory cortex to integrate changes in input at the faster rate and transmitting it to complex cells after performing spatiotemporal filtering[126].

A general information propagation scheme among the granular and agranular columns follows the distance rule(see fig 10(b)). Information from the sensory cortex which has fully developed layers follow the pathways through decreasing granularity to agranular cortices(feedforward pathway). On the contrary, information from the agranular cortex(majorly limbic cortex) can fan out information to increasing granular cortical regions progressively and finally terminate on the sensory granular layer(feedback pathway). The projections of feedback pathways diffuse among layer-1, 2/3 and 5 for nearby regions which progressively corner towards layer-1 and layer-6(fig 9(b)) and modulate the representation by influencing centre-surround pattern. Corticothalamic nuclei in layer-6 projects back to thalamic nuclei. These projections create a cortico-thalamic loop (fig 10(b)) which contributes significantly in integrating higher-order cognitive functions.

On the contrary, the projections of feedforward pathways are diffused among layer-1, 2/3 and 6 for nearby regions which progressively projects to interior layers-2/3/4, and 5(fig 9(b)) to project stimulus-driven pattern in the centre-surround configuration. Starting from the sensory cortex, which is having the fully expressed granular layer 4, the driving feedforward signal (modulated by feedback pathways at every stage) follows the structural descendants in terms of granularity and towards superficial layer (layer-2/3) and inferior layer (layer-5) of agranular cortices to limbic system[127]. These feedforward and feedback pathways create cortico-cortical and cortico-subcortical loops, for example, cortico-striato-thalmic loop forms segregated sub-cortical loops which integrates cognitive and affective aspect of behaviour[128].

functional microcircuits and neural ensembles:

Asymmetry in anatomical projection also favours the context and cognitive modulation of lower-order sensations. For both the feedforward and feedback projections, with the distance, the I/E asymmetry is more than if two regions had been in the nearby locations. Moreover, it is more asymmetric for the feedforward projections than for the feedback projections[55](fig 10(b)). A large proportion of inhibitory activity due to long-range projections in L-2/3 is regulating the encoding of information by performing inhibition in the neural ensemble of L-2/3 and thus might be encouraging the biasing for inference based internal representations of concepts and contexts. In the feedforward direction, due to increasing inhibition ratio with the longer distance, in higher regions, the excitatory activity (due to sensory input) is less than the inhibitory activity which might help codify complex abstract representations with less in lower-level details than in the nearby sensory regions[129, 76, 130, 131, 132, 133]. owever, we recognize that this interpretation of the structure-function relationship might be incomplete, and further research is needed to decode functional information out of structural arrangements.

The long-range projections from higher regions modulate the functional microcircuits of lower-level regions and vice-versa. In the laminar organization of cortical columns, the recurrent circuits (fig 10(b)) involve both excitatory and inhibitory neurons which, in general, creates the centre-surround pattern. These recurrent circuits[74] are driven and modulated by bottom-up and top-down projections, respectively. Top-down projections enhance the firing rates of putative inhibitory inter-neurons [76]. These different types of inhibitory interneurons play different roles in the top-down modulation. For example, in response to focal Cg axon activation, SOM+ and PV+ neurons inhibit pyramidal neurons over a broad cortical area (with SOM+ neurons as a major source of surround inhibition at 200 $\mu m$ ), whereas VIP+ neurons selectively enhance the responses at 0 $\mu m$ by localized inhibition of SOM+ neurons[134, 135](facilitating centre by dis-inhibiting selective pyramidal cells). The long-range projections in general target VIP+ neurons than other neuron types[136] in layer 2/3. The disinhibitory effect of VIP+ neurons on pyramidal neurons is reported in somatosensory[137, 131], visual[137, 138, 139], auditory[137, 136, 139, 130], mPFC[136], cross-modality sensory projections[140], and learning and memory[132, 141, 133]. This centre facilitation and surround suppression mechanism by top-down modulation are equivalent to the bottom-up centre-surround mechanism. Moreover, projections from individual neurons of remote regions(for example, Cg[76]) selectively projects to restricted neurons of the target regions(for example, top-down projections allow targeted spatial modulation in V1[76]). The gain effect to this attentional activity, according to "feature-similarity gain modulation principle", depends on similarity and difference between the attended feature (bottom-up activity) and the preferred feature (top-down activity) of the neural population[129]. In the case of similarity, the centre-surround effect due to both top-down and bottom-up activity match and scale the tuning curve in a multiplicative manner. On the contrary, in the case of mismatch of centre-surround effect, suppression of the tuning curve takes place[76]. Other than the cortico-cortical projections, limbic cortices issue widespread projections from their deep layers and reach eulaminate areas by terminating in layer-1[142] and thus modulate cortical representations.

Long-range anatomical projections, feedforward and feedback layers specific targeted projections, cortico-subcortical loop, neural ensemble in the form of centre-surround pattern and its modulation due to global state of the brain, represents the functional organization of the brain in the structural or anatomical frame. The information, encoded in topological(local functional circuits) and distributed global neural patterns, is communicated via anatomical neural projections to give bases for cognition-affect interaction.

Concluding Remarks and Future Perspectives

Based on our analysis and results, we infer that emotions are the product of the interaction of cognition and affect in a loop. Emotion can not be attributed to a definite marker, but they are dynamically decided statistical and fuzzy responses in the frame of the contexts existing at that moment. Emotion processing utilizes large-scale brain networks[143, 144] which are related to cognitive functions, for example, salience network, autobiographical events(episodic memory) related system and social processing related system[145, 66]. So, emotions are constructs or concepts which aim to avoid danger and approach social gain in order to achieve learned allostatic stability. Emotions follow structural and temporal hierarchical organization [146]. Varieties of activities which are happening at the lower sensory and autonomic control level can belong to the same emotion category, and different emotions can share some of the common features at the sensory and lower level [18].

figure[H]

The cognition-affect integrated model of emotion: Simply Summarized

Resembling the conceptual act theory, we suggest that emotions are a conceptual category[17] just like other conceptual categories.

We propose that emotion is a concept to regulate the internal milieu in order to achieve social/self-gain and/or accommodate social loss given a social context/situation. Emotion is a concept situated in the socio-cultural context and associated with the physiological activity of the body. The concept of emotion regulates the allostatic demand imbued in the socio-cultural context. Emotions are a subjective event involving self and society and recalled for anticipation and facilitation of allostatic regulation in advance for the upcoming event given the self and/or other concerning cues in the socio-cultural context. Allostasis is the mechanism[66] which regulates the physiological parameters of the body efficiently within a learned homeostatic range which is attuned based on the past knowledge about the situation/task. In other words, it is experience-based homeostasis with a dynamically set-point. The allostatic error is used to predict what the parameter is most likely to be - thus preparing the system to match it more effectively[66]. This prediction gets integrated with the currently sensed state to optimize its prediction for future use.

One neural signal which might be optimizing this foraging is reward signal and the most studied candidate for this reward signaling is dopamine pulses from ventral midbrain. The reward system contributes to foraging which emotion concept is appropriate based on past knowledge in the current situation.

Emotions are not different from other conceptual processing in terms of conceptual representation situated in the context and

No biomarkers.
Statistical and fuzzy in nature.
Constructed with the help of meaning making cognitive systems.
Utilizes feedforward and feedback mechanism just like other perceptions.
Long-range connections modulated the activity of center-surround patterns.

The ’Cognition-affect integrated model of emotion’ presented in this article provides quantitative and qualitative evidence for the involvement of meaning-making cognitive processes in creating the continuum of emotions from affect sensations. It is observed that the consideration of core affect alone can not categorize emotions until coupled together with the other cortical functions and in fact, the whole cortex is taking part in the process along with the subcortical regions. By considering the principles of brain organization, we wish to give the impression that anatomical structure convincingly supports the cognition-affect integration even at the micro-structural level. As we put our explanation in the framework of brain’s structural and functional organization, the cognition-affect interaction notion becomes more concrete.

The context plays essential role in shaping affect as emotion. However, the consideration of contextual details can be smaller or larger which will influence the delay in emotional reactions. The function of the neural ensemble is dynamically specific based on what the global activity is. Based on the differences in laminar structures of cortical columns, time-scale of processing, laminar specificity of projections and interplay of excitatory-inhibitory projections, a hierarchical structure in the cortex is processing and encoding the different level of complexity, implicit and explicit representations (more abstract and explicit at higher level to more general and implicit at lower level. Fig 9(a) & 9(c).

It is very unlikely for it to follow some particular route of connections between different regions. However, given a specific condition on many trials it is highly likely that the similar route of subjective activity will be followed. That means context is likely to bring the specificity in the activity.

The activation of learned episode which has ready-made information about past, present and future sequence of events[dragoi2011preplay, 58, 89, kraus2013hippocampal, eichenbaum2014time, 87] will cause learnt and self-generated prediction of external context, internal physiological representations and innate/social value activating different levels of details along the route in order to anticipate the hierarchical structure of the environment and sensory consequences of interaction with the environment apriori. This hierarchical and dynamical whole brain representation imbued with the social/innate affective value facilitating the allostasis stability is the emerged and constructed emotion.

The Figure Concluding Remarks and Future Perspectives gives a naive and concluding picture of cognition-affect integrated model of emotion. The organismic development is exposed to different conditions and situations. Reaction to these functions shape the anatomical connections and indeed refine it for faster response and sharpening of the spatial contrast[pouille2009input, hendrickson2015interactions]. Primary affect is innate, instinctual and reflexive in response whereas socially appropriate affect is learnt, conditioned and influenced by the social advantage. The conditioned affective response, to gain social advantage, perform the selective activation of innate physiological neuronal ensembles, based on learnt social conditions in the past, to stabilize the physiological parameters. The embodying association between social context and internal physiology is learnt through the social interaction and by observation in the cultural environment. The socio-culturally conditioned modulation of neuronal ensembles representing innate physiology is not due to one event but many such kind of events conditionally modulate the selective activity of innate neural ensembles through cortico-cortical and cortico-subcortical long-range projections. These different events are conceptualized in a social concepts called emotions. Since, these concepts are grounded on socio-cultural environment their construction and physiological implications varies from culture to culture and among individuals based on their adaptivity of these environments. With the presented model and three level of descriptions viz. subcortical-cortical loop, mesoscale level connections, and functional microcircuits progressively the classical notion of deterministic markers of emotions completely falls down.

In addition, each neural cell is part of the complex microcircuits and global network comprised of many cells which can broadly be categorized as excitatory and inhibitory.

•

Where and up to what extent does the classical notion of basic emotions and their universality fits in this model? In other words, how much context-dependent are “basic” emotions?

•

What role the structural hierarchy plays in the relevancy and quality of emotional response?

•

What role the hierarchy of temporal processing in cortical regions plays in emotional response time?

•

What is the cognition-affect interaction dynamics which can differentiate the emotional event from the non-emotional event at the functional connection-level?

•

What is the contribution of integration hubs and their interactions in cognition-affect modulation and evolution of emotion?

he internal physiological changes are just to incorporate the changes in the physical world. Emotion is a cognitive concept but more precise and sharper than the normal cognition due to its training from the physical world(For example, just see a child laughing over sudden appearance of you out of nowhere during play). So, during the development by training emotions(may be some basic emotions) are due to sudden and unexpected changes in the physical world which gets more continuous spectrum due to regulation and agreement by the society and culture over the abstraction in terms of concept of some physical posture and physiological sensations. However, it raises the question that the social emotions are conceptually manifestation of some basic emotional categories which is also learnt through nurturing only but limited in conceptual sense. For example, a boy who has always lived with one kind of expression over disagreed activity which is aggression, will use such kind of concept and physiological pattern over all the disagreed activity and in result will have less number of variety in concepts for the different degree of aggression which may be practised with different kind of nurturing in different socio-cultural environment. That means the boy will come up with similar kind of physiological and sensual context over encountering disagreed context. This defaultness and sharpness in the context for some learnt concept is able to more precise time integration window that means the surround inhibition is very strong. For example, in case of autism the feedback projection must be very strong and inhibitory which is able to inhibit the surround very effectively causing less dispersion over the sensory information . Doubt: How the feedback projection in autism is different from normal person?

The ongoing activity is the act of synaptic homeostasis. The ongoing activity in these networks is just to maintain the learnt homeostatic state and keep these cells and organisms alive. The brain is highly organized organ and due to learning and experience it has some innate and learnt underlying structure of connections. The most well studied and recognized patterns in this organized structural connectivity are termed as feedforward and feedback connections or pathways. The ongoing activity and the feedback projections(participating in the ongoing spontaneous activity) are the act of maintaining the state of homeostasis(of synaptic connections . Be more precise about spontaneous synaptic homeostasis) which as a result delivers important consequences termed as default contextual activity(DMN) and situated context(situation specific context). The job of this context representation can be interpreted as inhibiting/resisting the change so that organism can be in the state of homeostasis. And, the significant enough change in the feedforward connections which can not be inhibited by the imposed and maintained context at the individual level in the hierarchy allows to have new synaptic strength at these levels over many trials and adapted to be as a new context. I suspect that surpriseness and novelty of the event(different from what is predicted by the context) impose the aroused change(what does it mean by aroused change in terms of synaptic activity?) with a more robust and fast long-term potentiation effect(that means high spiking activity causes high synaptic strength)(The difference in synaptic activity during emotional and pure cognitive task)(There must be difference in synaptic strentgh in emotional and non-emotional task). If we come down to valence and arousal it makes sense because this can represent the whole cognitive spectrum

The intralayer, interareal and interlayer projections and local neural ensemble of excitatory-inhibitory connections can be interpreted to support the constructive model of concepts and event variations within each conceptual category at different level of hierarchy. The feedforward connections(lower-to-higher in the hierarchy) with layer-2/3, 4 as the target have higher inhibitory to excitatory ratio(I/E) than I/E ratio in feedback connections(higher-to-lower in the hierarchy) projected to layer-5 and layer-1 as the target. These feedforward projections carries sensory information to higher regions and feedforward inhibition due to both pathway normalized the input to the context and expands the cortical dynamic range[pouille2009input]. The more synaptic depression in the feedforward connections facilitate the gain for some stimulus features over others. Other than the intralayer inhibition the dendroids of pyramidal neurons in layer-2/3 also stretches to layer-1 which has strong inhibitory effect. Pyramidal neurons in layer-5 projects to layer-1 causing inhibition of layer-2/3 pyramidal neurons. Following the feedforward path of information from simple to more amodal to abstract to conceptual construct, at any level, the functional microcircuit representing the feedforward information is similar to what is projected from top-down will cause inhibition of this functional microcircuit and weakening the representation of contextual and background activity. On the other hand synaptic depression in layer-2/3 increases the gamma frequency bursts thus temporal window for integration and spatially contrast information(may be called salient information)[hasse2017corticogeniculate]. Pursuing this reasoning will lead to the notion that for the same concept there could be many events differing from each other at different level of details but under the same concept. So, the constituents of whole event(one of many) of the conceptual category involving abstract to unisensory level details are all distributed and can be excitepd at different level due to projections from layer-2/3 to layer-5. So, the feedforward projections if have stimulus feature (at any level) which is different from what is projected will cause modification of information in the internal representations at different levels. On the other hand, the inhibitory projections from layer-5 to layer-1 will cause inhibitory effect on stimulus features(at any level). So, both feedforward and feedback projections are in the service of garnering the internal representation about the inner and external world . The core cognitive systems are abstract level and amodal system which get activated for conceptual construction and perceptual inference.

In the renascent guise of ‘predictive coding’ or ‘predictive processing’, perceptual content is seen as resulting from probabilistic, knowledge driven inference on the external causes of sensory signals. This probabilistic and knowledge driven inference is inferred from the statistical model of the inside world as an cause to the sensory sensations. Based on this inferred cause the sensory coding is predicted which will be matched with upcoming sensory information which in case of match will be strengthening the internal model(collecting the evidence for the internal structure). If predicted sensory encoding and upcoming sensory input doesn’t match, the learning will take place as per the motivation. For example, the child’s motivation is exploring new thing so he/she will insist on it. To fulfill this motivation child will try different internal model(by crying, shouting, by saying). The internal model which is fulfilling the desire has got evidence for its success and also new association is learnt with this model. If the desire is not fulfilled, it will try to learn new model by observing. As we get older, we get the capability of sensory reconstruction which is not available in 0-day born child. Due to bimodal and trimodal neurons afferent sensory information gets integrated with somatosensory and motor action format. Feedbacks are more important than feedforward since they create this experiences and event/object specific associations.

A child who has never seen an angry expression will never learn it and will never associate any life event with it. Because his brain had never been in the environment which is embedding angry expression. The structure of the brain is mimicking the temporal and contextual structure of the world it is inhabited in. Using this learnt structure (internal model) the human brain infers actions which encodes sensory and interoceptive pattern. In the course of perception, action is always embedded. In the perception-action loop, the action which an agent predict is based on the perception he has made in the past and if the action stands true with the current change then no new learning. On the contrary, if there is difference between causal structure of the environment and causal structure of internally generated model the new learning will take place. Action is the causal explanation of the coming input. It is predicting what sensation will be coming next which is imbued in the internally created causal structure. This learning is hierarchical with different level of information resolution. In this way the whole system becomes information efficient system. For example, during learning subcortical and cortical structures encodes different level of information (coarse and detailed, respectively) which will come together while reconstructing the internal model of environment. Extending this theory to the internal model of the causal structure of the body, If the body structure is being changed due to some action it will be sensed in primary interoceptive sensory areas. Why the child falls during early age? Because it doesn’t have the motor program matured. Maturity of motor program is not only learnt coordination of different limbs but inherently learning of visceromotor activity to fulfill the metabolic demand beforehand so that action can take place. When the motor representation is matured enough there is an association between change, there is an association between change in visceromotor activity and sensorimotor activity. So, by even imagining the motor activity the visceromotor change can take place as per the ideomotor theory. The internal model is nurtured on the structure laid down by nature. Like one sees what he/she wants to see, one will feel what does he/she want to feel. The emotional and social expressions are also in the environment in which brain is inhabiting and brain has the internal model of this structure too. has the internal model of this structure also. During learning of these environmental structures, the structure of change in body is also included in the internal model. When one encounters a new expression it will find it strange(like a child) then internalizing this structure through error correction. So, slowly the internal model gets evolved through the repeated error correction. Now, as the same expression is expressed with some variation in intensity in the environment so will it be in the internal model and a new instance of the same category will be encoded in the internal representation. The brain is always confirming its own model with the current sensory event by imposing its own causal explanation actively inferred from internal perception(or structure). The whole internal model is different if the information at all the level is different. For example, if we take the what and where stream of perception, the information at both the level is different that means one is witnessing different category in different spaces and activity. But if the space information is only changing it will not change the nature of the what stream otherwise the same person will be in different category in different spaces. In the same way emotion awareness and expression can be a category with variation in time, space and the actors.

Top down predictions about the sensory consequences of events shape their perception, the generation of self-hood and the general cognitive frameworks for perceiving and acting within the environment [allen2018cognitivism].sensory consequences of events. Events here are being the mental representation. These sensory consequences are predictions from the mentally represented events. Generation of selfhood from the mentally represented event, generation of action for the mentally represented event. This mentally represented event is being modified in the successive loop to accommodate the current context and reduce the prediction error so that the feeling and action can take place for this contextually modified mental event. How far this successive loop goes? When does it brakes and the ANS and motor expression takes place? In general, neuronal representations in higher hierarchical levels are thought to generate predictions of representations at lower levels.These predictions are subsequently matched to lower-level representations constructed from sensory input, thereby generating a prediction error signal. Higher hierarchical levels means abstract representations. How these error signals are being considered? Are they updating only the mental model or also taking the actual effect on end organs and thus the next afferent signal and the current mental representation is inline according to the change. . This mismatch signal travels back up the hierarchy where it is used to update higher-order representations. This exchange of signals is thought to occur on multiple levels, thereby generating a hierarchically structured explanation of sensory input.

motion is projected not stimulated, because novelty can stimulate both emotional and/or non-emotional event in different persons and at different times in the same person. There might not be emotionally salient event but only salient event en capturing change or novelty from what is predicted by or represented in the feedback context. This novelty stimulate some change which will cause change in the context (at every level of hierarchy from abstract concept to lower-level unisensory detail and action). If this concept is some emotional name will result in that emotion and corresponding action and sensation. It should be noticed that the similarity or dissimilarity of the response depends on the society and culture which shapes such kind of learning. For example, there is no equivalent hindi word for german word Waldeinsamkeit so the related feeling for the context conceptualized by this word can’t be exactly or similarly felt to an individual belongs to the hindi language culture. Following this line of reasoning it can be proposed that there is no as such emotionally salient event. Salient event is just different from the predicted context which causes changes way up to the conceptual category(if entirely different event from the predicted context). Now, this conceptual category represented as the distributed neural pattern can be emotional or non-emotional based on socio-cultural nurturing of an individual. This concept construction is proposed to be largely unconscious. That means the selection of the next context (in an attempt to have context specificity) is largely mediated by feedforward input which at the . Why do we see many events for the same concept because there are multiple events differing at different level in the hierarchy for the same conceptual category. All those associated even ts with different level of details are presented and due to some attentional modulation a particular event from the past get selected. How these different options are tried and how the attentional modulation influence(if does) these choices or some general objective like synaptic homeostasis is being achieved only is the question for further research. If the senses gets opportunity to sense the sensory level detail of this hierarchically generated pattern, the concept is consciously sensed as emotion. On the other hand, if the subconscious level synaptic plasticity(in other words learning) is mediating the whole event, the action will become subconscious action(in other words, reflection).

Affective stimuli often have a direct impact on an organism’s homeostasis, either by providing nourishment or inflicting physical distress. But for many situations, the emotional stressor is just perceived or anticipated. For instance, emotional memory can arise from viewing a gruesome picture, or the delivery of bad news. Thus, an emotional experience does not require an overt biologically salient stimulus. And even if the stimulus is overtly affective, its salience is influenced by a subject’s goals and circumstances; a food reward will not be arousing to a satiated subject[headley2013sync].

Allostasis: Brain anticipates needs and provides organism’s physiological infrastructure to fulfil these needs in order to regulate the organism’s internal milieu[147]. Allostasis is different from homeostasis in that it has experience based variable stable point regulating the organism’s behavior, whereas homeostasis is based on constant setpoint and an error mechanism to regain this constant set point[66].

Core Affect: Core affects are the behavioural-action tendencies with instinctual-arousal tools of nature rather than constructions of nature. Core affects reflect relatively invisible neurodynamics of ancient brain systems. In the words of Jaak Panksepp "at their core, raw affective experiences appear to be pre-propositional gifts of nature—cognitively impenetrable tools for living that inform us about the states of our body, the sensory aspects of the world that support or detract from our survival, and various distinct types of emotional arousal that can inundate our minds. Affects reflect the heuristic value codes that magnificently assist survival, and give ‘value’ to life."

Predictive coding: A finding by[148] which signifies top-down and bottom-up processing as feedback and feedforward projections carrying prediction based on the inference from past and prediction error(in terms of predicted minus what is actually observed), respectively.

Recurrent Circuits or Microcircuits or Neural ensemble: A group of excitatory and inhibitory cells which co-activate together in a specific pattern upon receiving a cue and performing specific information processing relevant to a task. For example, neural ensemble creating concepts in the hippocampus, orientation columns in visual cortex and so on.

Methods

figure*[ht!]

**Methodology Flow Chart and Deep Learning Architecture: **(a)Methodology is reflecting the flow of complete analysis. (b) The summary of deep learning architecture having six convolution layers, three max-pooling layers and four dropout layers. The categorization is done using dense connected neural network layers after getting the self-learned feature representations at the flatten layer. (c) Flowchart for transfer learning. Preconditioning weights of deep learning architecture with different EEG datasets and fine-tuning the architecture for final testing.

Participants:

Deap Data, which is freely available online, is used[28]. Thirty-two healthy participants in two separate locations Twente (22 participants) and Geneva (10 participants) participated in the study. All of them singed an informed consent form before starting[149, 28]. The study sample comprised of right-handed 17 males and 15 females aged between 19 and 37 (mean age $27.19\pm 4.44$ ; right-handed; undergraduate and postgraduate; normal or corrected to normal vision; no history of neurological, psychiatric diseases or substance-related disorders; no significant general medical condition). Before the experiment, each participant signed a consent form and filled out a questionnaire.

Stimulations:

All the experimental stimuli were carefully filtered down from a collection of 120 stimulus videos to the final 40 test video clips chosen by using web-based emotion assessment interface. Stimuli and rating scales were presented using the software by Neurobehavioral systems on a 17-inch screen (1280x1024, 60Hz) with the 800x600 resolution to minimize eye movements. Participants were sitting approximately 1 meter away from the screen.

ontextual memory paradigm like emotional music cues have experimental advantages in that they avoid the confounding influence of emotion on perception when emotional stimuli are used as retrieval cues, but they assess the retrieval of emotional context rather than emotional content, which might not rely on the same mechanisms [medford2005emotional].

Music videos were used to elicit emotions in the participants during the experiment. Anticipation is the key to understand, comprehend and feel emotions while listening to music. Musical anticipation itself can evoke a variety of emotions [150]. The music-evoked emotions are comprised of three principles[151]. The first principle is serving the social function, second is related to musical expectancy and tension (due to the harmonic structure of music), and third is related to emotional contagion. The social function of emotional music is classified further in functions related to social cognition(understanding composer’s intention), co-pathy(empathically affected emotional homogeneity), social and emotional regulation through communication, action coordination and group cooperation. llostasis and predictive coding hypothesis[atzil2018growing] claims that social affiliation is rooted in allostasis. And, also social brain processing is very similar to domain-general circuits namely default mode network and salience network together makeup integrated network involved in allostasis.. Music is social; we inherently feel the social value of reaching to others in music or by moving others in a song across the broad social milieu [152, 153]. his dissection of effect of music validates the music as emotion stimulation and also support our consideration of different networks (dmn, salience, social, affect, ToM, working memory [kraus2010music]) for their combined and individual qualitative contribution.

Experimental Protocol:

The raw data, information about funding resources and ethical approval is available on[149, 28]. All the participants were given a set of instructions about the experiment protocol and explained the meaning of the different scales used for self-assessment. A practice trial for each participant is conducted to familiarize participants with the experiment. After the practice trial, the experimenter left the room and the participant started the experiment with a keypress on a keyboard.

Initially, a baseline recording of 2 minutes while participants were looking at the fixation cross is done. It followed by the presentation of 40 trial video in the following paradigm:

To inform the participants about the current trial a 2-second screen displaying trial no, 2. 2.

baseline recording with the display of fixation cross for 5 seconds, 3. 3.

display of trial for 60 seconds, 4. 4.

Rating scale of valence, arousal, dominance, familiarity, and liking for the self-assessment.

A short break after 20 trials were given to the participants and participants were offered with some cookies and non-alcoholic/non-caffeinated beverages during this time. After the break remaining trials followed above-mentioned steps.

Scalp Recording:

The experiment was performed in two laboratory environments with controlled illumination. EEG and peripheral physiological signals were recorded using a Biosemi ActiveTwo System. In experiment [28], 32 EEG active AgCl electrodes(10-20 system) and 8 peripheral physiological channels are used to record brain activity and peripheral physiological signals, respectively, while subjects were watching 1-minute emotional video excerpts. Each subject watched 40 excerpts. EEG was recorded at a sampling rate of 512 Hz.

EEG: The EEG measured at a site on the scalp consists of the vector sum of electrical fields from cortical neurons in a certain volume of tissues under the electrode [von2000different]. This measured potential on the cortical surface is generated due to postsynaptic dendritic current. When several dendrites are arranged in parallel, they generate potentials that can be measured at the scalp sites. Cortical activity and perception are not driven by the external stimulus alone; rather, sensory information has to be integrated with various other internal constraints such as expectations, recent memories, planned actions. EEG is one of the tools that can be utilized to assess such large scale integration over many remote and size-varying regions in some frequency ranges(due to the integration of large scale brain activities). Since the emotional activity is very instantaneous and causes activity in several brain regions; therefore, a method with the high temporal resolution is required to record this instantaneous activity. EEG study of emotions has this leverage over other neuroimaging techniques. The voxel-based connectivity analysis is done after source localization using sLoreta [wagner2004evaluation, 167].

EEG pre-processing to functional connectome

The methodology is depicted in fig 11(a). Using bioSig Matlab toolbox and EEGLAB the unprocessed DEAP data [28] is extracted. For preprocessing, the Makoto’s preprocessing pipeline is followed. From the continuous stream of data, EEG signals for emotion and baseline is extracted from the raw data. Originally data is recorded at 512Hz which is down-sampled to 128Hz. Using high pass filter at cut-off frequency 4.0Hz data is filtered. Again, using a low pass filter with cut-off frequency 45.0Hz data is filtered. We didn’t find any bad channels in the data. Data is re-referenced to average. ICA is applied to detect artefacts in the signal due to eye blink and major muscle movements (see figure-S1 in the supplementary section).

Narrowband theta oscillations (4-8Hz) have been considered for analysis. Reason behind selecting this band is that previous studies found synchronization in theta band for processing emotional modalities [154, 155, 156, 157, 158, 159, 160, 161]. Moreover, reduced synchronized activity in the theta band has been reported in case of emotional disorders [156, 162, 163, 164]. Also, there is plenty of evidence that during the mental reconstruction and holding of the information in working memory theta phase synchronization takes place.

EEG Geosource Localization:

The process of source localization involves forward and inverse modelling. Calculation of scalp potentials from the current sources in the brain with the help of some physical theory is said to be modelization or simulation or forward problem. Given the electrode potentials recorded at the distinct brain scalp sites, geometry and conductivity within the brain, estimating the location and magnitude of the current sources responsible for generating these potentials is the EEG inverse problem. The EEG inverse problem is an ill-posed problem as $N_{V}>>>N_{E}$ [35]. This ambiguity is constrained using the number of sources, spatial smoothness, spatial sparsity, and the combination of sparsity, as well as constraints on the dynamics of the source time courses [165]. Source localization problem had been the focus of interest for modelling community for decades and approached with different solutions: minimum norm, LORETA, sLORETA, eLORETA, MUSIC, FOCUSS, and ICA (see survey [166]).

In this study we have used standardized low resolution electromagnetic tomography (sLORETA) method [167]. It is a distributed inverse imaging method. The current density estimate is based on the minimum norm ( $l_{2}norm$ ) solution, and localization inference is based on standardized values of the current density estimates. sLORETA is capable of exact (zero-error) localization. The objective function to be minimized to get zero error localization is

$F=\|\Phi-\textbf{KJ}-c1\|^{2}+\alpha\|\textbf{J}\|^{2}$

where $\alpha\geq 0$ is a regularization parameter. This functional is to be minimized with respect to J and c, for given K, $\Phi$ and $\alpha$ . The explicit solution to this minimization problem is

$\hat{\textbf{J}}=\textbf{T}\Phi$

where:

$\textbf{T}=\textbf{K}^{T}\textbf{H}[HKK^{T}H+\alpha H]^{+}$

$\textbf{H}=\textbf{I}-\textbf{11}^{T}/\textbf{1}^{T}\textbf{1}$

with $\textbf{H}\in\mathbb{R}^{N_{E}*N_{E}}$ denoting the centering matrix; $\textbf{I}\in\mathbb{R}^{N_{E}*N_{E}}$ the identity matrix; and $\textbf{1}\in\mathbb{R}^{N_{E}xN_{E}}$ is a vector of ones.

MNE library [168] is used to perform source localization, and visualization of the activity is done using nilearn python library [169, 170].

Networks Analysis:

Application of network science in studying the connectome of the brain is revealing more significant functional insights of it. Anatomical and functional connectome of the human brain helped in parcellating brain structure at a refined level [171]. Voxel to voxel connectivity is calculated using PLV. Since in total the data size to process was extensive and time-consuming, we implemented the correlation connectivity using PLV values [172] in GP-GPU. PLV value is calculated by transforming the real signal into the analytical signal using the Hilbert transform. For the correlation-based functional connection analysis, PLV value is calculated with the following formulation.

$PLV_{ij}(t)=\frac{1}{N}|\Sigma_{n=1}^{N}e^{-i(\psi_{i}{(t,n)}-\psi_{j}{(t,n)})}|$

This calculation resulted in a matrix of 6239x6239(all voxel connectivity). NOVA t-test is designed for task conditions: baseline vs emotion and male vs female. Non-parametric permutation with FDR correction with the alpha level of 0.0002 is implemented and significant connections were considered to form the final network. Weights of the link in the network are decided based on the number of links between two regions. A high value indicates more correlated activity between two brain regions. Now those regions which had weights more than 99.5 percentile of all weights had been taken since we were getting many regions connected with 1 or 2 connections. Why only 0.05 percent of connections have been taken? Since, region connectivity with only one or two voxels doesn’t signify the magnitude of connectivity between two regions to be considered as significantly contributing in the information exchange. Finally, appropriate community detection algorithm for the mixing value of the graph is used to find out communities in the graph.

Voxel-pairwise functional decoding:

Each voxel pair calculated from our PLV based network analysis and filtered for significance based on the permutation test with $p-value<0.0002$ is selected. Communities were calculated for the obtained functional networks(explained in the next subheading). These voxel pairs were decoded for their associated functions using Neurosynth meta-analysis database. It is achieved in two steps: first, a co-activation network with 0.1 as activation threshold(threshold for a study to be included based on amount of activation displayed) and 5mm as radius(the distance cut-off for inclusion of studies with percentage of activation of seed pairs) and second, decoding the functions and their meta-analytic co-activation associated with the activity in the supplied voxel pair. The first step gave us the network-based co-activation map for the provided voxel pair as a seed and the second step provided us with a correlational factor signifying the association of the function with the probed voxel pair. Out of a list of functions, we have selected only those functions for which the correlation value were more than 0.3 and discarded other functions. Although there is no standard for these thresholds, the decision on the threshold was taken based on the prior knowledge about ’can this voxel pair possibly be related to the particular set of functions’. Likewise, for all the voxel pairs, the functions have been decoded. Only the functions with higher frequency were finally selected for further analysis.

Finding Communities:

Communities are the properties of real networks which could be characterized with comparatively dense intra-group connectivity than inter-group connectivity. The evidence of community structures in the brain signifies its segregation property, whereas the connection between these communities signifies the integration of these segregated functionalities of the brain. Validation of communities in real networks is done by comparison with benchmark graphs with a known number of communities and its size. In the study, LFR benchmark [173] is used to compare five different community detection algorithms. These are infomap [174], leading eigenvector, label propogation, multilevel, and edge betweenness [175]. Among these communities, we found that infomap random walk algorithm is working superior to the other algorithms. Infomap random walk is reported to be better in finding communities than other algorithms [176, 177]. n another comparison [177] have used LFR benchmark to compare eight community detection algorithms using Normalised mutual information and factor metric. They found the dependency of the accuracy of different algorithms on network-related parameters e.g. mixing value, network size, degree exponent and community exponent.

We didn’t find the correspondence between all the nodes in one community corresponding to any particular network out of many networks(for example, dmn, dorsal and ventral AN, SN, and so on). The reason behind this may be; first, the contribution of individual node in the network can be dissociated and can perform task-specific functionality, second, as per the task at hand, subclusters of nodes from different proposed networks(such as DMN, AM, AN, SN, social and theory of mind) can come together to form a large-scale task-specific integrated network. The community-wise connectome is shown in supplementary fig-S3.

MVPA Analysis using Deep Learning:

Generalized training on four datasets(Mental arithmetic [46], motor movement and imagery[47], grid of characters[48], and SSVEP EEG database[49]) is done on the convolution neural network(CNN), a deep learning architecture, (fig 11(b)) with the following specifications: four 2d convolution layer with 32 kernels and two 2d convolution layer with 64 kernels of size 3x3; relu activation function for the convolution layers and soft-plus activation function for output layer; adam optimizer with learning rate:0.0001(zero decay), $\beta_{1}$ :0.9, $\beta_{2}$ :0.999; categorical cross-entropy comparison for error calculation between actual and predicted class. Keras Python deep learning library with TensorFlow as the backend has been used for creating the CNN model with the parameters mentioned above. The model is trained on Nvidia Tesla-V100-PCIE data centre GPU with 16 GB capacity with the input tensor of 448x1230x99.

We created 200 batches of training input for each of the considered datasets (the batch mentioned here should not be confused with deep learning batch size which is typically used during training to deal with the computational and convergence time). That means, total of 800 batches of training input with varying number of samples. Machine trained on one batch was used in the next batch and so on. In this way, the trained machine was quite robust on detecting general features of electrophysiological data. The details about feature calculation, input size, s/w and h/w specifications are discussed in the methods section.

The input tensor (for the emotion data we have analyzed) had 448 samples distributed as 14 emotion stimuli for 32 subjects. Since we have used above mentioned four datasets to train our model, we have created in total 800 input batches(200 input batches per input training dataset) with varying number of samples (as per the subjects and outputs of the training data). 1230 are the number of voxels. This number has kept constant across the different above mentioned training datasets since 615 pairwise connections between regions are calculated using statistical significance analysis. In the input tensor 448x1230x99, the last dimension is representing statistical features (including median, standard deviation, mean, maximum, range, minimum, skewness, variance and kurtosis values) calculated on 9 segments constructed from the 60-second signal. The same statistical features were calculated for the whole signal adding 9 extra features on 90 features calculated from the 9 segments (creating in total 99 feature). These statistical features are calculated due to the trade-off among the increasing number of weight parameters for the architecture complexity and limitation of machine capability to deal with the large input size of 448x32x7680(If the whole signal for 60 seconds with 128 sampling rate had been considered). The trained general EEG model is used for the final testing on original voxel time-series for the same batch size. The true-positive rate and false-positive rate for the ROC analysis is calculated using the following formula: $TPR=\frac{TP}{TP+FN}$ and $FPR=\frac{FP}{FP+TN}$ where TP stands for true positive, FN stands for false negative, FP stands for false positive, and TN stands for true negative. To understand how the quantification of ROC and AUC describes the quality of classifier in distinguishing different classes, please see fig-S10 in the supplementary section.

figure[H]

Area under the curve(AUC) quantify the capability of the classification model in distinguishing different classes in the dataset. With the 100% classification, the value of AUC will be 1.

The EEG data that is analyzed following the methodology is available at https://doi.org/10.1109/T-AFFC.2011.15

Processed Data and Code Availability

All the code and processed data will be made available upon publication.

Ethics declarations

This study was carried out in compliance with the DEAP data [28] (available online) end user license agreement (available on [178]). All the ethical guidelines provided in the above mentioned license agreement form have been rigorously followed. Consent information for each participant is included in the participant questionnaire file (available on [149]). The authors also declare no competing interests.

Author contributions statement

S.M and U.S.T. both have developed the presented idea and model. S.M did the coding whereas S.M and U.S.T. both discussed and decided the computational procedure. The interpretation of the calculated results is done by both U.S.T and S.M.. The first draft of the manuscript is prepared by the S.M. U.S.T. and S.M. both refined the manuscript to the presentation and submission level.

Additional information

The corresponding author is responsible for submitting a competing interests statement on behalf of all authors of the paper.

Figures

Bibliography198

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Charles, D., Paul, E. & Phillip, P. The expression of the emotions in man and animals. \Journal Title Electronic Text Center, University of Virginia Library (1872).
2[2] Russell, J. A. Core affect and the psychological construction of emotion. \Journal Title Psychological review 110 , 145 (2003).
3[3] Ekman, P. & Oster, H. Facial expressions of emotion. \Journal Title Annual review of psychology 30 , 527–554 (1979).
4[4] Panksepp, J. Affective neuroscience: The foundations of human and animal emotions (Oxford university press, 2004).
5[5] Lazarus, R. S., Kanner, A. D. & Folkman, S. Emotions: A cognitive–phenomenological analysis. In Theories of emotion , 189–217 (Elsevier, 1980).
6[6] Arnold, M. B. Emotion and personality. (1960).
7[7] Leventhal, H. Toward a comprehensive theory of emotion. In Advances in experimental social psychology , vol. 13, 139–207 (Elsevier, 1980).
8[8] Juslin, P. N. & Västfjäll, D. All emotions are not created equal: Reaching beyond the traditional disputes. \Journal Title Behavioral and Brain Sciences 31 , 600–621 (2008).