A Survey of Affective Recommender Systems: Modeling Attitudes, Emotions, and Moods for Personalization
Tonmoy Hasan, Razvan Bunescu

TL;DR
This survey comprehensively reviews affective recommender systems, classifying them based on psychological theories, and discusses current techniques, challenges, and future directions for emotion and mood-based personalization.
Contribution
It introduces a taxonomy grounded in psychology, categorizes existing systems, and highlights key trends, limitations, and open challenges in affective recommender systems.
Findings
Classification scheme based on Scherer's typology
Identification of key affective signal extraction techniques
Highlighting open challenges and future research directions
Abstract
Affective Recommender Systems are an emerging class of intelligent systems that aim to enhance personalization by aligning recommendations with users' affective states. Reflecting a growing interest, a number of surveys have been published in this area, however they lack an organizing taxonomy grounded in psychology and they often study only specific types of affective states or application domains. This survey addresses these limitations by providing a comprehensive, systematic review of affective recommender systems across diverse domains. Drawing from Scherer's typology of affective states, we introduce a classification scheme that organizes systems into four main categories: attitude aware, emotion aware, mood aware, and hybrid. We further document affective signal extraction techniques, system architectures, and application areas, highlighting key trends, limitations, and open…
| Survey Papers | Affective | Attitude | Emotion | Mood | Serendipity | App. |
| taxo. | (stable) | (brief) | (durable) | (hybrid) | domain | |
| Katarya and Verma (Katarya and Verma, 2016) (2016) | ✓ | ✓ | ✓ | Broad | ||
| Atas et al. (Atas et al., 2021) (2021) | ✓ | Broad | ||||
| Wang and Zhao (Wang and Zhao, 2022) (2022) | ✓ | ✓ | ✓ | Video | ||
| Salazar et al. (Salazar et al., 2021) (2021) | ✓ | ✓ | Education | |||
| Piçarra et al. (Piçarra et al., 2022) (2022) | ✓ | ✓ | ✓ | Movie | ||
| Santamaria-Granados et al. (Santamaria-Granados et al., 2021) (2021) | ✓ | ✓ | ✓ | Tourism | ||
| Al-Ghuribi and Noah (Al-Ghuribi and Noah, 2021) (2021) | ✓ | Broad | ||||
| Kaminskas and Ricci (Kaminskas and Ricci, 2012) (2018) | ✓ | ✓ | Music | |||
| Kotkov et al. (Kotkov et al., 2016) (2016) | ✓ | Broad | ||||
| Ziarani and Ravanmehr (Ziarani and Ravanmehr, 2021b) (2021) | ✓ | Broad | ||||
| Fu et al. (Fu et al., 2023a) (2023) | ✓ | Broad | ||||
| Kaminskas and Bridge (Kaminskas and Bridge, 2016) (2016) | ✓ | Broad | ||||
| Abbas and Niu (Abbas and Niu, 2019) (2019) | ✓ | Broad | ||||
| This survey | ✓ | ✓ | ✓ | ✓ | ✓ | Broad |
| Papers | Data Source | Application | Techniques | Att. Modeling |
|---|---|---|---|---|
| (Li et al., 2016) | Ra, item metadata, interactions | Movie, TV Program | LR, SVMR | Discrete (5) |
| (Cai and Xu, 2019) | Blog posts, interactions, time, likes, comment | Social Media | MF, Cos | Discrete (2) |
| (Zhang, 2015) | Rev, Ra | E-commerce | Lexicon-based approach | Discrete (2) |
| (N. and K.M., 2023) | Tweets, comments, course description | Education | ERNN, Cos, JS | Discrete (3) |
| (Hyun et al., 2018) | Rev, Ra | E-commerce | CNN, DeepCoNN, D-Attn | Continuous |
| (Cai et al., 2022) | Rev, Ra | E-commerce | LDA, DNN, MF, FM, Cos | Discrete (2) |
| (Shi et al., 2022) | Rev, Ra | E-commerce, restaurant | GCN, Co-Attn, FM | Implicit |
| (Wu et al., 2020) | Clicks, news title | News | Transformers | Discrete (2) |
| (Zhang et al., 2021a) | Rev, Ra, User ID, Item ID | E-commerce | BERT, Attn | Discrete (3) |
| (Zhang and Zhang, 2022) | Rev | Movie | BERT, LDA, TF-IDF | Discrete 3 |
| (Zheng et al., 2020) | Rev, interactions | E-commerce, restaurant | LSTM, RNN, Attn | Discrete (3) |
| (Renjith et al., 2021) | Rev, Ra | E-commerce | TF-IDF | Discrete (3) |
| (Chen et al., 2019a) | Rev, item specifications | E-commerce | SentiWordNet | Continuous |
| (Alam et al., 2022) | Clicks, age, gender, date, news metadata | News | BERT, RippleNet, TF-IDF | Continuous |
| (Lin et al., 2021; He et al., 2022) | Rev, Ra | E-commerce | DeepCoNN, MF, NARRE | Discrete (2) |
| (Li et al., 2020a) | Rev, Ra, item metadata | E-commerce | SentiWordNet, LDA, MF | Discrete (3) |
| (Ghosal et al., 2019) | Research papers, peer Rev | Education | CNN, MLP, VADER, USE | Discrete (3) |
| (Saadat et al., 2024; Garg, 2021) | Rev, Ra, drug names | Healthcare | KGCN / LR, NB, DT, RF | Discrete (3) |
| (Park et al., 2022) | Rev, Ra, item metadata | E-commerce | Reinforcement learning | Discrete (3) |
| (Ho et al., 2012) | News titles, URLs, publication dates, location | News | LDA, SVM | Discrete (3) |
| (Hu et al., 2020) | Rev, Ra, user metadata, item specification | E-commerce | fastText | Discrete (5) |
| (Liu and Zhao, 2023) | Rev, Ra | E-commerce | BERT, LDA, MF, RNN | Implicit |
| (Song et al., 2017) | Rev, Ra | Movie, restaurant | Latent Factor Model | Implicit |
| (Wang et al., 2017) | User ID, item ID, location, Rev, item metadata | Tourism | LDA | Discrete (2) |
| (Wang et al., 2022) | Rev, Ra, item metadata | Restaurants | MF | Discrete (2) |
| (Zhang et al., 2014) | Rev, Ra | Restaurants, phones | K-means, MF, rule-based | Discrete (2) |
| (Alatrash and Priyadarshini, 2023) | Rev, Ra, item metadata | Education | Bi-LSTM, Attn, Word2Vec | Discrete (5) |
| (Da’u and Salim, 2019) | Rev, Ra | E-commerce, restaurant | LSTM, FM, Attn | Implicit |
| (Gong et al., 2024) | Check-ins, User comments, POI metadata | POI | LSTM, GCN, Attn | Discrete (2) |
| (Xie et al., 2024) | User ID, item ID, Rev, Ra | E-commerce | LSTM, CNN, TextBlob | Continuous |
| (Rosa et al., 2019) | User posts, user-item metadata | Healthcare | CNN, RNN, LSTM | Implicit |
| (Musto et al., 2019a) | Rev, Ra, item metadata | Movie, book | Rule-based | Discrete (2) |
| (Abbasi-Moud et al., 2021) | User check-in, Rev, Ra, location, time, weather | Tourism | SentiWordNet | Discrete (2) |
| (Asani et al., 2021) | Rev, check-in, location, time | Restaurant | SentiWordNet, Cos | Discrete (2) |
| (Kumar et al., 2020) | Ra, tweets, movie metadata | Movie | VADER | Discrete (3) |
| (Huang et al., 2020) | Rev, Ra, item metadata | E-commerce | SentiWordNet, Cos | Discrete (3) |
| (Padia et al., 2019) | Tweets, time, POI metadata | Tourism | VADER, TextBlob | Discrete (3) |
| (Artemenko et al., 2020) | Rev, location | Tourism | Rule-based | Discrete (5) |
| (Khattak et al., 2020) | Tweets, user metadata | Healthcare | Rule-based | Discrete (3) |
| (Musto et al., 2019b) | Rev, Ra, item metadata | Movie | Stanford CoreNLP, KL-div | Discrete (5) |
| (Liu et al., 2021) | Rev, Ra | E-commerce | Bi-GRU, CNN, Attn | Implicit |
| (Liu and Zhao, 2023) | Rev, Ra | E-commerce | Bi-RNN, BERT, LDA, MF | Implicit |
| (Li et al., 2021) | Rev, Ra | E-commerce | CNN, BERT | Implicit |
| (Bai et al., 2020) | Rev, Ra, item titles | E-commerce | BERT, FM, Transformer | Discrete (3) |
| (Osman et al., 2021) | Rev, Ra | E-commerce | CF, Cos | Discrete (3) |
| (Musto et al., 2017) | Rev, Ra | Hotels, E-commerce | MF, SGD | Implicit |
| (Irfan et al., 2019) | Rev, Ra, location | Tourism | NB, SVM | Discrete (3) |
| (García-Cumbreras et al., 2013) | Rev, Ra, movie metadata | Movie | KNN, SVM, MF | Discrete (2) |
| (Shao et al., 2019) | Rev, place description and images | Tourism | SentiWordNet, LDA | Continuous |
| (Wang et al., 2018) | Tweets, location, time | Location | SVM | Continuous |
| (Yang et al., 2013) | Check-in, Rev, location | Location | MF, SentiWordNet | Continuous |
| (Zanon et al., 2022) | Rev, Ra, interactions, item metadata | Movie | KNN, Stanford CoreNLP | Continuous |
| (Lu et al., 2021) | Past interaction, Rev, movie title & metadata | Movie | Transformer, GNN | Discrete (2) |
| (Darraz et al., 2025) | Ra, Rev, user-item metadata, past interaction | Restaurant, hotel | BERT | Discrete (2) |
| Types | Papers | Data Source | Application | Techniques | Affective Terms |
|---|---|---|---|---|---|
| (Deng et al., 2015b) | Music title, lyrics, time | Music | BPR, Cos | 21 emotion terms | |
| (Moscati et al., 2024) | Interactions, tags | Music | FM, DeepFM | Tenderness, joyful, nostalgia, wonder, sadness, tension | |
| (Shen et al., 2020) | Interactions, tweets | Music | FM, DNN, DeepFM | More than 20 emotion terms | |
| (Meng et al., 2018) | Ra, user ID | E-commerce | MF | Pos., Neg. | |
| (Breitfuss et al., 2021) | Rev, item metadata | Movie | GraphDB | Apathy, joy, Neu., and Pos. | |
| (Aramanda et al., 2023) | Rev, Ra | E-commerce | CF | 8 emotion terms | |
| (Poirson and Cunha, 2019) | Ra, Rev, item metadata | Movie | Pearson correlation | 8 emotion terms | |
| (Lim and Kim, 2017) | Ra, Rev, item Des | Movie | Rule-based | Implicit | |
| (Zhang et al., 2024b) | Rev, Ra | Movie | GPT, R-GCN, Llama-2 | 9 emotion terms | |
| Cat. | (Kuo et al., 2005) | Music metadata | Music | MAF, RW | More than 20 emotion terms |
| (Akiyama et al., 2017) | Tweets | Social media | SVM | 8 emotion terms | |
| (Wu et al., 2016) | Tweets | Social media | SVM, LR, K-Means | Happiness, surprise, anger, disgust, fear, sadness | |
| (Zhao et al., 2011) | Video, age, gender | Video | Rule-based | Fear, Neu., sadness, surprise, happiness, disgust, anger | |
| (Kim and Kim, 2021) | Music metadata, gender, age, interactions | Music, image | SVM, GA | Happy, sad, angry, surprised, bored, Neu. | |
| (Yun et al., 2023) | News metadata, interactions | News | CNN, Attn | Happy, angry, sad, surprise, fear | |
| (Orellana-Rodriguez et al., 2015) | Comments, user-item metadata | Short film | Cos, TF-IDF, PC | 8 emotion terms | |
| (Olga C. Santos and Rodriguez-Sanchez, 2016) | physiological sensor data | Education | Rule-based | Implicit | |
| (Guo et al., 2019) | Rev, Ra, item metadata | Movie | Rule-based | 14 emotion terms | |
| (Su et al., 2020) | Ra, video, gender, item images, item metadata | Fashion | RCNN, SVM | Happy, sad, angry, disgust, fear, surprise, Neu. | |
| (Bustos López et al., 2021) | Ra, user-item metadata | Education | CF | 7 emotion terms | |
| (Adru and Johnson, 2024) | Images, music metadata | Music | LGBMClassifier, K-Means | Anger, fear, happy, disgust | |
| (Kim et al., 2021) | Audio, images | Music, image | SVM, GA | Happy, sad, angry, surprise, bored | |
| (Niu et al., 2017) | Video features | Video | SVM | 8 emotion terms | |
| (Wu et al., 2017) | User-metadata, tweets | Social media | SGD, K-means | 6 emotion terms | |
| (Zhang et al., 2024a) | Clicks, browsing, Ra, time | Short video | FM, DeepFM, AFM | Happy, like, fear, sad, angry, hate, surprised, jealousy | |
| (Kim and Lim, 2018) | Item Des, Ra | Education | MLE | Implicit | |
| (Costa and Macedo, 2013) | News text, user metadata | News | SVM, Rule-based | Implicit | |
| (Tkalčič et al., 2010) | Ra, image metadata | Images | AdaBoost, NB, SVM | Implicit | |
| (Matsui and Yamada, 2019) | User feedback | E-commerce | Rule-based | Implicit | |
| (Mizgajski and Morzy, 2019) | Rev, news metadata | News | CF | 8 emotion terms | |
| (Chheda et al., 2023) | Music metadata, audio, image | Music | MobileNetV3, ResNet, EfficientNetB4 | Implicit | |
| (Sun, 2022) | Songs’ tags | Music | MF | Implicit | |
| Dim. | (Tkalčič et al., 2012) | Ra, images | Image | AdaBoost, KNN, NB, SVM | Joy, fear, anger, surprise, disgust, sadness |
| (Santos et al., 2014b) | Ra, item metadata | Education | AdaBoost, NB, RF | 8 emotion terms | |
| (Ayata et al., 2018) | GSR & PPG signals | Music | DT, SVM, RF, K-NN | Implicit | |
| (Santamaria-Granados et al., 2019) | ECG & GSR signals | Tourism | DCNN, SVM, K-NN | 7 emotion terms | |
| (Tao and Alatas, 2024) | User metadata, news text | News | DT, NMF | Pos., Neg., Neu. | |
| (Tkalcic et al., 2013) | Video recording of users’ facial expressions, images | Images | K-NN, SVM, NB, AdaBoost | Joy, fear, anger, surprise, disgust, sadness | |
| (Yoon et al., 2012) | Ra, past interaction | Music | Rule-based | Angry, happy, sad, peaceful | |
| (Han et al., 2024) | HRV & GSR signals | Music | VAE, CNN, Attn | 8 emotion terms | |
| (Ferrato et al., 2022) | Facial video | Tourism | Rule-based | Implicit | |
| (Revathy et al., 2023) | Music lyrics, audio clip | Music | BERT, NB, LR, SVM, RF | Happy, angry, sad, relaxed | |
| (Sasaki et al., 2013) | Image, acoustic features | Music | PCA | Implicit | |
| (Deng et al., 2015a) | Ra, music metadata | Music | SVR, BR | 23 emotion terms | |
| (Caglar-Ozhan et al., 2022) | EEG & GSR signals | Education | ResnetV2, CNN | Happy, Neu., sad, disgust, surprised, afraid, angry | |
| Dim. + Cat. | (Zheng et al., 2016) | Ra, time, location, weather, user metadata | Movie | MF | Sad, Neu., happy, scared, surprise, angry, disgusted |
| (Ishanka and Yukawa, 2017) | Rev, Ra, item metadata | Tourism | CF | 8 emotion terms | |
| (Tripathi et al., 2019) | Video recording of users’ facial expressions, interactions | Video | SARSA, Q-Learning, DBRNN | Joy, sadness, fear | |
| (Kim and Hong, 2024) | Temperature, video | Smart home | GANN, SGD, MLP, MF | Implicit | |
| Latent | (Rostami et al., 2024) | Interactions, Ra, comments, ingredients | Food | MF, NeuMF, VAECF | Implicit |
| (Yin et al., 2024) | Ra, Rev, past interaction | Movie | CF, MF | 14 emotion terms |
| Papers | Data Source | Application | Techniques | Affective Terms |
|---|---|---|---|---|
| (Bontempelli et al., 2022) | User explicit preference, audio | Music | VGG-like, RF, CF | 7 mood terms |
| (Chen et al., 2016) | Song metadata | Music | CF | Implicit |
| (Andjelkovic et al., 2016) | Artist metadata, audio metadata | Music | K-NN | Implicit |
| (Andjelkovic et al., 2019) | Artist metadata, audio metadata | Music | RF, VGG-like | 7 mood terms |
| (Marshall and Wang, 2016) | Tweets | Social media | EM | Implicit |
| (Ueda et al., 2016) | Recipe, rating | Food | EM, CF | Cheerful, exhilarated |
| (Tang et al., 2021) | Past interaction, user metadata | Education | Cos | Pos, Neg, stable |
| Types | Papers | Data Source | Application | Techniques | Affect Modeling / Terms |
|---|---|---|---|---|---|
| (Wang et al., 2023) | Rev | Airline | EmoLex | Discrete / 8 emotion terms | |
| (Sertkan and Neidhardt, 2022) | Click, interactions, news text | News | BERT | Discrete / 28 emotion terms | |
| Emotion+ Attitude | (Chen and Tang, 2018) | Lyric | Music | CF, TF-IDF | Implicit / happy, angry, sad, relaxing |
| (Wang et al., 2024) | Interactions, music metadata | Music | FM, DeepFM, NFM | Implicit | |
| (Gao and Li, 2022) | Rev, Ra, user-item metadata | E-commerce | SVM, Apriori algorithm | Implicit / happy, sad, relax, fear | |
| (Gilda et al., 2017) | Images, item metadata | Music | CNN, ANN, SGD, Cos | Happy, sad, angry, | |
| (Cai et al., 2007) | Lyric, review | Music | LDA, KL div | 40 emotion terms | |
| Emotion + Mood | (Zheng et al., 2016) | Ra, time, location, weather, user metadata | Movie | MF | Sad, happy, scared, surprise, angry, disgusted, neutral |
| (Piazza et al., 2017) | Ra, age, gender, item image | Fashion | FM, SGD | 23 emotion terms | |
| (Polignano et al., 2021) | Rev, audio metadata, lyric, time | Music | Cos, LR | Joy, anger, sadness, surprise | |
| (Liu et al., 2023) | Audio metadata, genre, interactions | Music | LSTM, CNN, KNN | Happy, anger, sad, fear |
| Types | Papers | Data Source | Application | Techniques |
|---|---|---|---|---|
| (Yang et al., 2017) | Ra | Movie | MF | |
| (Ge et al., 2020) | Interactions, location | Tourism | RW, Word2Vec | |
| (Zhang et al., 2021b) | Interactions, location | Tourism | Transformer | |
| (Ziarani and Ravanmehr, 2021a) | Ra | Movie | CNN | |
| (Boo et al., 2023) | Click, browsing history | E-commerce | GNN | |
| (Afridi, 2018) | Ra | Education | Jacquard similarity | |
| (Fu et al., 2023b) | Rev | Book | Transformer | |
| (Xu et al., 2020) | Ra, interactions | Movie, book | MLP, MF, | |
| (Li et al., 2019) | Ra, movie metadata | Movie | RNN | |
| Discovery | (Adamopoulos and Tuzhilin, 2014) | Ra, item tag | Book | MF, KNN |
| (Fu et al., 2024) | Rev | Movie, book | MF, BERT | |
| (Li et al., 2020c) | Ra, click, interactions | Movie, restaurant | MLP, GRU | |
| (Li and Tuzhilin, 2020) | Ra, Rev, interactions | Tourism | NCF, KNN, FM, AE | |
| (Zhang et al., 2012) | Interactions | Music | LDA | |
| (Onuma et al., 2009) | Ra | Movie | RW | |
| (Chen et al., 2019b) | User survey data | E-commerce | CF | |
| (Zheng et al., 2015) | Ra | Movie | Rule-based | |
| (Wang et al., 2020) | Clicks, interactions, item metadata | E-commerce | Apriori algorithm, LR | |
| (Pandey et al., 2018) | Ra | Movie | NCF, NeuMF, MLP | |
| (Li et al., 2020b) | Ra, interactions | Book, movie | GMM, CapsNets | |
| (Lee, 2020) | Interactions, app description | Mobile app | VAE | |
| (Taramigkou et al., 2013) | Interactions, music tag, artist | Music | LDA, Cos | |
| (Kawamae, 2010) | Item metadata, Ra | Movie, music | MC | |
| (Lu et al., 2012) | Ra | Music, movie | SVD | |
| (Murakami et al., 2007) | Item metadata | TV program | Bayesian network | |
| (Wang and Chen, 2023) | Ra, click, interactions, user-item metadata | E-commerce, Movie | PCA, Spearman’s correlation | |
| (Wang et al., 2019) | Interactions, browsing | E-commerce | KNN, Cos | |
| (Grange et al., 2019) | Rev, Ra, interactions | Restaurant | LR | |
| (Sugiyama and Kan, 2015) | User publication history, citation | Research article | Clustering, CF | |
| (Afridi et al., 2020) | User metadata | Research article | ||
| (Niu and Al-Doulat, 2021) | News topic, Ra | Health news | PMI, Rule-based | |
| (Maake et al., 2019) | Paper title | Education | BisoNet, Log-likelihood Ratio | |
| (Niu, 2018) | News topic | Health news | Probabilistic method | |
| (Jenders et al., 2015) | News title, text, news metadata | News | LDA, Cos | |
| Content | (Hasan and Bunescu, 2023) | Ra, Rev, interactions | Book | BLR, AROW |
| (Li and Tuzhilin, 2024) | Ra, interactions | E-commerce, Movie | FM, NCF, KNN, NMF | |
| (Li, 2020) | Interactions | E-commerce | Rule-based | |
| (Fan and Niu, 2018) | News topic, click | Health news | Rule-based | |
| (Niu et al., 2018) | Click, news topic | Health news | LDA, KL div | |
| (Huang et al., 2018) | Click, item description | Entity | CNN, LDA | |
| (Kotkov et al., 2020) | Ra, serendipity label | Movie | SVD | |
| (Niu and Abbas, 2017) | News topic, user feedback | Health news | Rule-based | |
| (Maccatrozzo et al., 2017) | Item title, genre | TV program | Cos, LR |
| Domain | Dataset Name | Description |
| ReDial Li et al. (Li et al., 2018) | The ReDial dataset contains over 10,000 two-party dialogues with 182,150 utterances covering 51,699 movies. It includes user feedback (“like,” “dislike,” “not say”) for precise evaluation and sentiment-labeled responses, enabling sentiment-aware recommendations. | |
| Movie | MovieLens 100K, 1M, 10M, 20M Harper et al. (Harper and Konstan, 2015) | A suite of movie rating datasets containing between 100K and 20M ratings from 943 to 138,493 users on 1,682 to 27,278 movies. |
| Netflix Bennett et al. (Bennett and Lanning, 2007) | The Netflix Prize dataset consists of over 100 million movie ratings collected from 480,000 anonymous users across 17,770 movie titles between October 1998 and December 2005. | |
| Fashion | Amazon Fashion He et al. (He and McAuley, 2016) | A large-scale dataset from Amazon consisting of two sub-datasets: Women’s Clothing & Accessories and Men’s Clothing & Accessories, spanning from March 2003 to July 2014. |
| Piazza et al. (Piazza et al., 2017) | A novel dataset comprising 337 participants, 64 fashion products, and 10,816 ratings, linking affective states to fashion preferences for personalized recommendations. | |
| News | Ho et al. (Ho et al., 2012) | 3,652 articles from 21 sources, incorporating sentiment for event-based recommendations. |
| Healthcare | Serrano-Guerrero et al. (Serrano-Guerrero et al., 2024) | A real-world dataset of 18,952 patient reviews from ten hospitals, categorized into positive, negative, and affective reflections, annotated for sentiment analysis and hospital ranking. |
| E-commerce | Amazon Product McAuley et al. (McAuley et al., 2015) | A large-scale e-commerce dataset sourced from Amazon, comprising 6 million products and over 1 million visual features. |
| Goodreads Wan et al. (Wan and McAuley, 2018) | A large-scale dataset collected from Goodreads, consisting of 229,207,408 interactions from 8,555,857 users and 2,397,688 books. | |
| Book | Goodreads Spoiler Wan et al. (Wan et al., 2019) | A large-scale book review dataset collected from Goodreads, comprising 1,378,033 reviews across 25,475 books and 18,892 users. |
| BookCrossing Ziegler et al. (Ziegler et al., 2005) | The dataset comprises 278,858 users, 1,157,112 ratings, and 271,379 books, enriched with Amazon.com’s book taxonomy of 13,525 topics and 466,573 descriptors. | |
| Travel | Shao et al. (Shao et al., 2019) | The dataset consists of two sources: TripAdvisor and Trip.com. From TripAdvisor, it includes 459,180 textual comments and 43,964 images from 14,648 tourist documents, while, Trip.com contributes 293,847 textual comments and 19,492 images from 6,513 attractions. |
| Music | Weibo Music Emotion Deng et al. (Deng et al., 2015b) | The final dataset includes 1,059,037 records, linking 29,065 users to 59,053 music items, associating user emotions inferred from microblogs with their music preferences. |
| PEIA Music Shen et al. (Shen et al., 2020) | The dataset includes 171,254 users, 35,993 music tracks, and 18,508,966 user-music interactions. | |
| TROMPA-MER Gómez-Cañón et al. (Gómez-Cañón et al., 2023) | The TROMPA-MER dataset includes 181 users, 691 music tracks, 4721 user-item annotations, and captures emotions using Ekman’s 4 basic emotions and 7 emotions from the Geneva Emotion Music Scale. |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
A Survey of Affective Recommender Systems: Modeling Attitudes, Emotions, and Moods for Personalization
Tonmoy Hasan
and
Razvan Bunescu
0000-0003-2919-3566 University of North Carolina at CharlotteCharlotteNCUSA
(2018)
Abstract.
Affective Recommender Systems are an emerging class of intelligent systems that aim to enhance personalization by aligning recommendations with users’ affective states. Reflecting a growing interest, a number of surveys have been published in this area, however they lack an organizing taxonomy grounded in psychology and they often study only specific types of affective states or application domains. This survey addresses these limitations by providing a comprehensive, systematic review of affective recommender systems across diverse domains. Drawing from Scherer’s typology of affective states, we introduce a classification scheme that organizes systems into four main categories: attitude aware, emotion aware, mood aware, and hybrid. We further document affective signal extraction techniques, system architectures, and application areas, highlighting key trends, limitations, and open challenges. As future research directions, we emphasize hybrid models that leverage multiple types of affective states across different modalities, the development of large-scale affect-aware datasets, and the need to replace the folk vocabulary of affective states with a more precise terminology grounded in cognitive and social psychology. Through its systematic review of existing research and challenges, this survey aims to serve as a comprehensive reference and a useful guide for advancing academic research and industry applications in affect-driven personalization.
Literature survey, affective recommender systems, affective states, emotion-aware recommendation
††copyright: acmlicensed††journalyear: 2018††doi: XXXXXXX.XXXXXXX††journal: POMACS††journalvolume: 37††journalnumber: 4††article: 111††publicationmonth: 8
1. Introduction and Motivation
Understanding human emotions and their impact on behavior has long been a central focus in psychology. The term ”affect” was notably used in psychological studies by Wundt (Wundt, 1897) in the late 19th century to describe the subjective experience of pleasantness or unpleasantness. Since then, the concept of affect has evolved, encompassing a wide range of emotional and psychological states that include emotions (short term), moods (longer lasting), and attitudes (stable), each impacting human behavior in unique ways (Ekman, 1992; Scherer et al., 2000). Recognizing the powerful role of affect in human behavior, researchers in psychology and behavioral sciences have extensively studied its impact on human motivation, decision-making, and social interactions. In 1999, Picard (Picard, 1999) proposed that by recognizing and responding to human emotions, machines could enhance human-computer interaction not only by making it more empathetic, but also by enabling systems to naturally adapt to users, improve communication, and better handle affective information, such as frustration, confusion, interest, and preference. The growing understanding of affect in human behavior laid the groundwork for the field of affecting computing (Picard, 2000; Calvo and D’Mello, 2010), which has evolved into a broad discipline with applications to healthcare, gaming, education, and beyond.
One of the most promising applications of affective computing is in Recommender Systems (RS), where understanding and incorporating users’ affective states into recommendation algorithms can lead to significantly improved user satisfaction. Traditionally, recommender systems leverage users’ explicit preferences (e.g., product ratings, stated interests) and behavioral history (e.g., clicks, browsing patterns). However, affective RS go a step further by incorporating users’ emotional and psychological states, henceforth referred to as affective states, into the recommendation process. By integrating affective signals, recommender systems can dynamically adjust recommendations to align with the user’s current mood, emotional state, or long-term dispositions, leading to a more engaging and satisfying user experience (Tkalčič and Chen, 2022). For instance, a user’s mood can directly influence their content choices, like preferring relaxing music after a stressful day or choosing upbeat movies for a boost in mood. Similarly, a person feeling lonely may seek emotionally resonant books or supportive online communities to improve their affective state.
Integrating affective state information has shown promise in a range of recommendation domains. In music and video streaming platforms, for example, affective RS adapt content suggestions to match a user’s mood, enhancing the emotional impact of recommendations (Kaminskas and Ricci, 2012). In education, a study by Santos et al. (Santos et al., 2014a) explored how students’ affective states, such as boredom or engagement, can be detected and used to adapt educational content accordingly. Their research demonstrated that recognizing and responding to students’ emotions can enhance learning experiences and improve learning outcomes. Mizgajski and Morzy (Mizgajski and Morzy, 2019) investigated how emotions influence reading choices in online news, highlighting the role of affective states in content consumption. While their study focused on news, its findings extend to e-commerce, where aligning product recommendations with users’ emotions or sentiments can enhance engagement, satisfaction, and potentially increase purchase intent (Pappas et al., 2017). However, incorporating users’ affective states into recommender systems is not an easy task. For example, emotions and moods are dynamic, context-dependent, and can vary considerably from one user to another, posing challenges in reliably detecting and responding to affective states in real time (Scherer, 2001). Despite these challenges, affective RS have gained popularity because of their potential to significantly enhance recommendations through a nuanced understanding of users’ emotional states and preferences.
Reflecting the growing importance of leveraging user preferences and emotions in recommender systems, the number of publications on affective RS has risen significantly in recent years. The increasing interest calls for a comprehensive review on affective RS aimed at helping researchers and practitioners better understand their strengths, limitations, and best use cases.
1.1. Differences between This Survey and Prior Surveys
A number of surveys have been conducted in various subdomains of affective recommendation systems by focusing on particular types of affective states or on particular application domains. Overall, these surveys have been limited by their narrower scope and the lack of a structured approach to affective states informed by psychological theories of emotion. Furthermore, depending on their publication date, prior surveys do not cover the most recent advancements in affective RS. For instance, Katarya and Verma (Katarya and Verma, 2016) provided a broad overview of affective advancements within recommender systems, but lacked a structured affective taxonomy, and did not distinguish between significantly different types of affective states such as sentiment, emotion, and mood (Scherer, 2005). Furthermore, their work did not consider serendipity-oriented recommender systems, even though serendipity, through its surprise component, is a complex affective state with important consequences for user satisfaction. Some affective RS surveys had a narrower scope by focusing on a specific domain. For example, Wang and Zhao (Wang and Zhao, 2022) examined video applications, Salazar et al. (Salazar et al., 2021) investigated affective adaptation in education, Piçarra et al. (Piçarra et al., 2022) focused on emotion-based movie navigation, and Santamaria-Granados et al. (Santamaria-Granados et al., 2021) reviewed emotion recognition techniques in tourism. Similar to Katarya and Verma (Katarya and Verma, 2016), these works lacked a structured affective taxonomy and did not include serendipity-oriented recommendations. Additionally, some of them ignored long-term affective states such as mood, which play a crucial role in sustained user engagement. Other affective RS surveys focus solely on a particular type of affective state. For example, Al-Ghuribi and Noah (Al-Ghuribi and Noah, 2021) targeted sentiment analysis techniques, categorizing works by methods such as lexicon-based or machine learning-based approaches. In the area of context-driven music recommendations, Kaminskas and Ricci (Kaminskas and Ricci, 2012) considered only emotion-aware recommender systems. Several surveys exclusively explored serendipity in recommender systems. For example, Kotkov et al. (Kotkov et al., 2016) and Ziarani and Ravanmehr (Ziarani and Ravanmehr, 2021b) explored techniques to achieve unexpectedness and relevance, categorizing works according to the various methods used to promote serendipity, diversity, and novelty. Works by Fu et al. (Fu et al., 2023a) and Kaminskas and Bridge (Kaminskas and Bridge, 2016) examined deep learning techniques and beyond-accuracy goals in serendipitous recommendations, categorizing studies by technical methods and beyond-accuracy metrics.
Notwithstanding the important contributions of these surveys, they remain fragmented and limited in scope, often focusing on particular types of affective states or on specific application domains. To the best of our knowledge, no survey has explored the full range of affective states, categorized into psychologically grounded types such as emotions, moods, or attitudes. Table 1 presents a comparative analysis of existing surveys, evaluating whether they provide a structured affective taxonomy, comprehensively cover all types of affective states, and consider diverse application domains.
As shown in Table 1, previous surveys are limited in the types of affective states they cover or the range of applications they consider. Our survey addresses these limitations by providing a structured taxonomy that organizes affective RS across multiple application domains, offering a holistic perspective on the role of affective signals in personalized recommendations.
1.2. Literature Search Methodology
To conduct a comprehensive survey of affective RS, we started by systematically gathering relevant literature published between 2015 and 2025. Our approach ensured the inclusion of high-quality, relevant studies that address various types of affective states, such as sentiment, emotion, and mood, across all application domains within recommender systems. We performed an initial search across major academic databases, including SpringerLink, IEEE Xplore, ACM Digital Library, ScienceDirect, Web of Science, and Wiley. The search query was designed to capture studies that explore affective aspects in recommender systems. Specifically, we used boolean operators to construct the following search string in article titles:
- (recommend OR recommender OR recommendations OR recommendation) AND (sentiment OR unexpectedness OR unexpected OR emotion OR emotional OR affective OR mood OR empathetic OR emotions OR surprise OR surprising OR serendipity OR serendipitous OR psychology).
Besides journal articles, we ensured the inclusion of papers from peer-reviewed conferences known for publishing impactful work in recommender systems and related fields, such as RecSys, SIGIR, CIKM, AAAI, NeurIPS, WWW, WSDM, KDD, UMAP, EMNLP, ACL, NAACL, and ICML.
The initial search yielded approximately 1,500 articles. We removed duplicate records across the databases and then we screened the remaining papers by examining titles and abstracts to assess their relevance to our study’s focus on affective RS. Factors considered included primary research focus, contribution relevance, and publication venue prominence. After this preliminary screening, we retained 172 articles. To identify additional relevant works outside of the initial target timeframe, we applied a snowball sampling technique, based on the studies cited within the initially selected papers. This approach added 35 more papers that fell outside the designated publication timeframe, selected based on their relevance, citation impact, and the significance of their publication venues. This approach ensured that our database included influential works that align with the study’s focus on affective RS. The final corpus consisted of 207 papers, covering a broad range of topics within sentiment, emotion, mood, and serendipity aware recommender systems.
1.3. Contributions of This Survey
The primary aim of this survey is to provide a comprehensive review of affective RS, covering different types affective states, including sentiment, emotion, and mood, across diverse application domains. The key contributions of this survey are as follows:
- •
We introduce a novel classification scheme for affective recommender systems based on Scherer’s typology of affective states (Scherer et al., 2000), categorizing publications according to the distinct types of affective states that they target (Section 2).
- •
We systematically review how attitude (Section 4), emotion (Section 5), mood (Section 6), and hybrid affective states (Section 7) are modeled, detected, and integrated into recommendation strategies (Section 3) across different applications domains (Section 8).
- •
We explore the current challenges and gaps in affective recommender systems, offering suggestions for future research directions, in particular encouraging the development of datasets and recommender models that target a broader range of affective states (Section 9).
2. Background and Classification Scheme
Affective computing is a multidisciplinary field that uses computational methods to study and develop systems that recognize, interpret, process, and simulate the various affective aspects of human behavior (Picard, 2000; Tao and Tan, 2005). Over the years, numerous studies in psychology, social sciences, and philosophy have tried to define and characterize different types of affective states that people experience, and resolve the inherent fuzziness of the various language categories and folk terms that people use. For example, even though the concept of ”emotion” is used very frequently, the question of ”what is an emotion” rarely obtains the same answer from different individuals (Scherer et al., 2000). The peripheral theory of William James (James, 1922), for instance, conflates the terms ”emotion” and ”feeling”. However, according to the component process model of Scherer, the term ”feeling” should be reserved only for the subjective experience component of emotion, distinct from facial and vocal expression (motor), action tendencies (motivational), bodily symptoms (neurophysiological), or appraisal (cognitive) components. As part of the effort to clarify what emotions are, Watson and Tellegen (Watson and Tellegen, 1985) focused on classifying moods as affective states distinct from emotions, proposing a model in which affective states are arranged on positive and negative affect dimensions. According to their research, moods are diffuse, longer-lasting states that subtly influence perception and behavior without being directly tied to specific stimuli. Thayer (Thayer, 1990) explored mood as a specific type of affective state that is distinct from emotions and attitudes, providing a model that integrates mood with physiological arousal levels. Ekman (Ekman, 1992) distinguished basic emotions from other affective states, such as moods and attitudes, by identifying the unique characteristics of emotions (e.g., facial expressions and physiological patterns) that may not manifest in other types of affective states. Russell and Barrett (Russell and Barrett, 1999) introduced core affect as a foundational concept that captures a broad spectrum of affective states beyond discrete emotions, characterizing them using continuous dimensions such as valence (pleasantness vs. unpleasantness) and arousal (activation vs. deactivation). Barrett (Barrett, 2006) discussed how affective states, such as mood and emotion, arise from a core affect and highlighted the continuum of affective experiences that range from general affective feelings to specific emotional responses. Psychologist Scherer (Scherer et al., 2000; Scherer, 2005) distinguished emotion from other types of affective states, emphasizing that affective states vary in their duration, intensity, and the degree to which they are linked to specific stimuli. Scherer’s typology includes affective attitudes as stable evaluative dispositions, which are less intense than emotions, relatively enduring, and reflect affectively colored beliefs towards objects or individuals. Examples of attitudes include preferences, liking, loving, hating, valuing, and desiring. Emotions represent more intense, short-lived responses to specific stimuli, often accompanied by physiological and mental changes. Emotions encompass a wide range of states, such as anger, sadness, joy, fear, shame, pride, elation, desperation, and surprise. Moods are more diffuse and enduring affective states that subtly influence perceptions and behaviors over extended periods, with examples such as cheerful, gloomy, irritable, listless, depressed, and buoyant. Informed by Scherer’s typology of affective states, the Geneva Emotion Wheel (Sacharin et al., 2012) was developed as a two dimensional representation of discrete emotion families, in order to enable the elicitation of affective descriptions from users in a standard, systematic way. Figure 1 summarizes some of the foundational psychological theories of affect discussed above.
The diverse nature of affective experiences, ranging from attitudes and moods to emotions, has inspired various computational frameworks for personalization in RS. When designing affective RS, it is important to consider underlying psychological theories of affective states. Building upon these theories, this survey proposes a novel classification scheme that mirrors the psychological categorization of affective states into attitudes, emotions, moods, and hybrid states. Additionally, the survey characterizes how each of these affective dimensions influences user preferences and impacts recommender system outcomes.
2.1. Classification Scheme for Affective Recommender Systems
The classification of affective states has been explored from multiple perspectives in psychology and affective computing, with contributions from scholars such as Watson, Thayer, Ekman, Russell, Barrett, and Scherer. Categorical models (e.g., Ekman’s six basic emotions), dimensional models (e.g., Russell’s circumplex model, Thayer’s arousal-tension model), and core affect theories (e.g., Barrett’s conceptual act theory), while providing useful affective frameworks, they primarily focus on emotions and often do not systematically differentiate emotions from other affective states such as attitudes and moods. Comparatively, Scherer’s typology offers a more comprehensive framework by distinguishing affective states based on their duration, intensity, and cognitive involvement. Unlike categorical and dimensional models, which primarily emphasize emotions, Scherer’s classification incorporates a broader spectrum of affective phenomena, including stable evaluative dispositions, i.e., attitudes, transient affective states, i.e., emotions, and longer-lasting states, i.e., moods. This distinction is particularly relevant to affective RS, where different types of affective states influence recommendations in different ways: emotions capture immediate user responses, moods shape long-term consumption patterns, and attitudes reflect stable preferences.
Drawing from Scherer’s categorization of affective states, this survey classifies affective RS into four main categories: attitude, emotion, mood, and hybrid. Each category is further subdivided based on methodological approaches, as illustrated in Fig. 2. This classification provides a structured foundation for organizing affect-aware recommendation techniques, enabling a systematic comparison of methodologies across various domains.
- •
Attitude: This category includes sentiment-aware recommender systems that focus on users’ stable evaluative judgments, such as positive or negative sentiments toward items. Notably, all recommender systems inherently incorporate some form of user attitude, either implicitly through historical interaction data (e.g., clicks, purchases, ratings) or explicitly through task definitions that aim to predict whether a user will like an item. However, in this survey, we focus specifically on those RS that go beyond such notions of liking or disliking an item and instead aim to extract and model attitudes from unstructured, user-generated content such as textual reviews, social media posts and comments, blog writing, or forum discussions.
- •
Emotion: This category covers emotion-aware recommender systems, which model transient affective responses arising from specific stimuli. To capture users’ emotional states, these systems typically leverage categorical models (e.g., Ekman’s six basic emotions) and dimensional models (e.g., Russell’s valence-arousal framework, Plutchik’s multidimensional framework).
- •
Mood: This category includes mood-aware recommender systems focusing on longer lasting affective states, such as cheerfulness, melancholy, or relaxation. Unlike emotions, moods persist over extended periods and influence long-term content consumption patterns. These systems leverage behavioral signals, contextual information, and interaction histories to infer users’ mood-driven preferences.
- •
Hybrid: This category refers to recommender systems that integrate multiple types of affective states for improved personalization. Examples include attitude-emotion hybrid approaches, which combine users’ evaluative preferences (e.g., sentiment) with transient affective states (emotion), and emotion-mood hybrid systems, which account for both emotional responses and mood states to enhance content adaptation.
- –
Serendipity: A special subcategory of hybrid systems is represented by serendipity-oriented recommender systems. While serendipity is not itself a basic affective state, it can be equated as an unexpected and pleasant experience. This characterization reflects a combination of two types of affective components: (i) the emotion of unexpectedness or surprise, and (ii) the positive evaluative attitude toward the outcome.
3. General Architecture of Affective Recommender Systems
Traditional recommender systems primarily rely on explicit ratings, past interactions, and user metadata to predict preferences (Zhang et al., 2019; Burke, 2002; Bobadilla et al., 2013; Lu et al., 2015). However, these methods overlook the affective dimension of user experience, which plays a crucial role in shaping content preferences. Affective RS bridge this gap by integrating users’ emotions, moods, and attitudes into the recommendation process, ensuring that the system adapts not only to what users like, but also to how they feel during different stages of interaction (Tkalcic et al., 2011), such as at the point of entry, during content consumption, or upon exit. Figure 3 illustrates a general framework for affective RS consisting of three key stages of affective modeling and integration, as described below:
- (1)
User Affective Modeling. Understanding a user’s affective states requires integrating diverse data sources. These include user metadata (e.g., age, location), behavioral signals (e.g., watch duration, clicks, browsing history), past interactions (e.g., purchases, likes, reviews), feedback (e.g., ratings, textual comments), and explicit affective input (e.g., self-reported emotions). Affect-aware systems perform feature extraction to convert these signals into structured affective representations. For instance, in e-commerce, textual affective features can be derived from reviews or social media feedback, capturing emotional reactions to previous purchases. In music and video streaming platforms, audio-visual cues, such as facial expressions, offer deeper insight into affective preferences. Once extracted, these features are used to construct a dynamic model of the user’s affective profile, capturing users’ attitude, emotional states and mood. This modeling can be rule-based, or driven by machine learning and deep learning techniques that continuously adapt based on user feedback. For example, in movie recommendation, the system might suggest nostalgic films when the user feels lonely or action films during heightened excitement. 2. (2)
Item Affective Feature Representation. This component focuses on extracting affect-relevant attributes from item content. Inputs may include item metadata (e.g., genre, brand), descriptive information (e.g., specifications), user-generated content (e.g., reviews), and multimedia signals (e.g., trailers, product images, music). These features are processed and structured using feature engineering and embedding methods. For instance, affective information from textual content can be extracted through feature engineering approaches such as part-of-speech tagging, syntactic parsing, and affective lexicon matching. Embedding methods, such as Word2Vec, GloVe, and BERT embedding, are then used to transform this information into numerical representations suitable for downstream modeling. In image- and video-based recommender systems, convolutional neural networks (CNNs) are commonly employed to extract visual-emotional cues, such as color tone, brightness, and facial expressions. These affective features are typically embedded into lower-dimensional spaces for integration with user modeling. 3. (3)
Affect-Aware Recommendation Generation. In this final stage, user affective and item affective features representations are combined to generate personalized recommendations. Candidate items are first retrieved using standard techniques such as collaborative filtering, content-based filtering, or hybrid methods. Affective signals are then used to re-rank these candidates, prioritizing items that align with the user’s current affective states or long-term tendencies. In addition to affective cues, some systems further incorporate contextual information, such as time of day, location, or weather conditions, to refine personalization, thereby enhancing user engagement, satisfaction, and emotional resonance.
In the following sections, we survey Affective RS approaches according to the four major categories of our classification scheme: attitude-aware (Section 4), emotion-aware (Section 5), mood-aware (Section 6), and hybrid systems (Section 7).
4. Attitude-Aware Recommender Systems
Attitude-aware recommender systems aim to incorporate users’ evaluative preferences, often inferred from affective expressions of sentiment or opinion, into the recommendation process. These preferences reflect relatively stable user attitudes toward items, attributes, or content categories. For instance, in an e-commerce setting, the following review illustrates how users convey nuanced evaluations that extend beyond simple like or dislike.
Example Review
“I love how lightweight and modern this laptop is, but the battery life is disappointing”
From this review, a system can infer that the user values sleek and portable designs but is dissatisfied with poor battery performance. In future interactions, such cues can guide the system to prioritize laptops with long battery life and filter out bulkier models with limited endurance. This example highlights how even a single review can reveal multiple attitudinal signals, which, when effectively captured, can substantially improve recommendation relevance.
Due to sustained progress in sentiment analysis techniques over more than two decades of research, attitude-aware RS approaches vastly outnumber the other types of affective RS, as such they have received significantly more attention in literature reviews. As a quick-reference resource, Table LABEL:tab:attitude lists representative sentiment-aware recommender system papers along with their targeted sentiment fusion strategies, application domains, attitude modeling approaches, and data sources.
To infer user attitudes, affective RS rely on a variety of data sources, including textual reviews, explicit ratings, behavioral signals such as clicks or watch time, and even demographic information. These systems use affective modeling through feature engineering, machine learning, or deep learning to extract evaluative cues that shape user-item representations, influence item rankings, or adjust predicted ratings. Understanding both the sources and nature of affective data is therefore essential for building effective attitude-aware recommender systems. Affective signals can originate from user behavior (Section 4.1) and item data (Section 4.2), whereas their impact on recommendations can be modulated by contextual conditions (Section 4.3).
4.1. User-Centered Attitude Data
User-centered data is foundational for modeling attitudinal preferences in affective RS. This category includes user-generated content such as reviews, microblogs, comments, past interactions, ratings, browsing history, clicks, and demographic metadata. These user-related signals are frequently leveraged to extract affective information and shape sentiment-aware recommendation models utilizing approaches ranging from lexicon based approaches to deep learning and language model based approaches, as summarized below:
- •
Lexicon Based Approaches. These methods rely on predefined dictionaries of sentiment-laden words, such as SentiWordNet, VADER, AFINN, and TextBlob, to incorporate user-related information into the recommendation process. Aspect-based sentiment analysis (ABSA) is often employed with lexicon-based methods to extract sentiment at a feature level. For instance, Musto et al. (Musto et al., 2017) and A2SPR (Huang et al., 2020) employed SentiWordNet for aspect-level sentiment scoring within collaborative filtering and graph-based models, respectively, utilizing user reviews, comments, and check-in history to enhance recommendation quality by aligning with user preferences. Similarly, Serrano-Serrano-Guerrero et al. (Serrano-Guerrero et al., 2024) applied SenticNet and VADER in a fuzzy linguistic healthcare recommender, using user reviews and comments to improve hospital service rankings through sentiment analysis. Li et al. (Li et al., 2020a) also relied on user reviews and comments, extending ABSA by incorporating temporal dynamics into SentiWordNet-based sentiment scoring to capture users’ evolving preferences over time.
- •
Traditional Machine Learning Approaches. Traditional machine learning methods have been widely employed to classify sentiment and incorporate it into recommendation systems utilizing user information. For instance, Li et al. (Li et al., 2016) combined rule-based sentiment extraction using LingPipe with SVM and logistic regression to mine user sentiment from microblog posts, comments, and movie-watching histories for predicting movie preferences. Building on the idea of leveraging user-generated content, Cai and Xu (Cai and Xu, 2019) utilized social media posts and user interactions (e.g., likes) to extract sentiment features, which were then integrated into a matrix factorization model (SIO-TMF) enhanced with LDA for improved friend recommendation in social networks. Extending sentiment-based modeling to address data sparsity issues, Zhang (Zhang, 2015) proposed a matrix factorization framework that incorporates sentiment-enhanced feature-opinion pairs extracted from user reviews, effectively mitigating cold-start challenges where explicit user ratings are sparse or unavailable.
- •
Deep Learning Approaches. Deep learning introduced greater accuracy in sentiment-aware recommendation systems. N. and K.M. (N. and K.M., 2023) employed Elman Recurrent Neural Networks (ERNN) to perform sentiment classification on user course reviews extracted from tweets and comments, integrating sentiment signals into an e-learning recommender system that outperformed traditional models. Liu et al. (Liu et al., 2021) earlier proposed a multilingual sentiment-aware model that uses bidirectional GRUs with attention mechanisms to perform aspect-based sentiment analysis on multilingual user reviews and ratings, enhancing rating prediction across languages. To improve both scalability and rating prediction accuracy, Hyun et al. (Hyun et al., 2018) introduced SentiRec, a two-step CNN-based recommendation model that encodes user sentiment from reviews into fixed-size vectors for user and item representations.
- •
Language Model Based Approaches. The emergence of With the emergence of large pre-trained language models such as BERT, sentiment-aware recommender systems have significantly improved their ability to capture contextualized user preferences from user reviews, enabling more accurate and personalized recommendations across various domains. A range of sentiment-aware recommendation systems have leveraged BERT to enrich user modeling, including Zhang et al. (Zhang et al., 2021a), who fused sentiment signals in a review-encoding network to alleviate sparsity and cold-start problems; Wu (Wu, 2024), who combined BERT with Bayesian Personalized Ranking to jointly capture semantic and visual sentiment representations for improved predictive accuracy; Zhang and Zhang (Zhang and Zhang, 2022), who incorporated topic-aware sentiment classification to enhance recommendation diversity; and Alam et al. (Alam et al., 2022), who analyzed affect-laden news content and revealed that user demographics influence affective interpretation.
In summary, user rooted affective modeling in sentiment aware recommender systems encompasses a wide spectrum of methods, from lexicon based sentiment scoring and traditional classifiers to neural architectures and language model based techniques. These approaches leverage the richness of user related data to enhance personalization, interpretability, and the overall effectiveness of recommendation systems.
4.2. Item-Centered Attitude Data
Item-centered data provides important targets of attitude signals, especially when combined with aspect oriented sentiment expressed in reviews. This category includes structured product metadata (e.g., specifications, categories, and attributes), item descriptions, or item-relevant data from social media. A variety of approaches integrate these signals into item modeling to enhance recommendation quality:
- •
Item Features. Explicit product attributes and descriptive metadata are often aligned with sentiment signals to construct affect-aware item representations. Chen et al. (Chen et al., 2019a) combined product specifications (e.g., price, RAM, battery life) with sentiment scores derived from user reviews using WordNet-based similarity and SentiWordNet polarity. These signals were integrated into a utility-based ranking function to generate interpretable, sentiment-aware recommendations. Similarly, Li et al. (Li et al., 2020a) aligned aspect metadata from item descriptions (e.g., CPU, RAM) with sentiment scores from reviews to build aspect-level sentiment matrices. These were incorporated into a similarity based matrix factorization model where item-item similarity was influenced by both latent and sentiment-aware aspect signals.
- •
Social Media. Sentiment-oriented text from external sources such as social media can inform item representations. For instance, N. and K.M. (N. and K.M., 2023) constructed item representations for e-learning courses by leveraging unstructured textual data sourced from social networking platforms such as Twitter. A hybrid sentiment analysis framework, comprising ITF-IDF, Word2Vec, Hybrid N-gram features, and an Elman Recurrent Neural Network (ERNN) was employed to extract sentiment polarities from the course-related texts. The resulting sentiment profiles were used to represent each course and subsequently matched with user sentiment vectors using multiple similarity measures to generate personalized recommendations.
- •
Multimodal Data. Multimodal signals enhance item representations and improve recommendation performance. For instance, Wu (Wu, 2024) combined visual sentiment from movie posters with semantic sentiment from user reviews. These signals were embedded into a four-dimensional tensor encompassing users, items, visual sentiment, and semantic sentiment, with tensor decomposition applied to learn latent factors for ranking via Bayesian Personalized Ranking (BPR).
Taken together, these approaches highlight how item-rooted affective signals, ranging from structured specifications and aspect-level sentiment to multimodal inputs and unstructured social discourse, enable nuanced modeling of item affective preferences. Beyond supporting personalized content retrieval, item sentiment representations also contribute to explanation generation and fairness assessment, underscoring their broader utility in enhancing both the performance and transparency of affect-aware recommender systems.
4.3. Contextual Data
Contextual data provides essential situational awareness in affective RS, ensuring that recommendations are not only sentiment-aligned but also feasible and temporally appropriate. This category includes information such as location, time, weather, and other environmental or situational variables that influence the practical relevance of recommendations at the time they are delivered. In the domain of tourism recommendation, Abbasi-Moud et al. (Abbasi-Moud et al., 2021) proposed a sentiment-aware system that integrated contextual information, specifically, location, time, and weather, with user check-ins, ratings, and review texts. User sentiment preferences were then refined through contextual filtering: only attractions located in the user’s current city, operating at the time of recommendation, and appropriate to the prevailing weather conditions (e.g., prioritizing indoor venues during rain or snow) were considered, ensuring that recommended items were both sentiment-aligned and contextually feasible. A similar approach was applied in restaurant recommendation (Asani et al., 2021), using location and operating hours as contextual filters. Extending beyond static context, Cai and Xu (Cai and Xu, 2019) segmented user histories over time and applied matrix factorization within each segment to capture time-based changes in sentiment, enhancing adaptability in friend recommendations.
Overall, by incorporating contextual factors like location, time, and weather, the system can filter out impractical suggestions, making sure the recommendations are not only attitude-aligned, but also usable and contextually appropriate.
5. Emotion-Aware Recommender Systems
Adapting to users’ emotional states at the various stages of the recommendation process (entry, consumption, and exit) can significantly improve recommendation relevance, user satisfaction, and engagement (Tkalcic et al., 2011; Zheng et al., 2013). For instance, emotion recognition at the entry stage can help recommend calming content after a stressful event, or amusing content during moments of sadness. The importance of modeling emotions lies in how they reflect short-term, immediate user states that traditional approaches may overlook. To represent emotions, two types of models have been widely adopted in emotion-aware recommender systems:
- •
Categorical models, which define emotions as discrete states (Section 5.1).
- •
Dimensional models, which describe emotions along continuous scales such as valence and arousal (Section 5.2).
In Figure 4 we shows the taxonomy used to organize emotion-aware recommender systems. Table LABEL:tab:emotion provides a concise tabular overview of a comprehensive set of emotion-aware RS papers, specifying emotion modeling approaches, application domains, and data sources. In the rest of this section, we analyze representative work, focusing on their emotion modeling strategies and how these are applied across different domains.
5.1. Categorical Models of Emotion
Categorical models of emotion have their roots in the field of psychology, where researchers have long sought to classify emotions into discrete categories. One of the most influential contributions in this area came from Ekman (Ekman, 1971, 1992), who identified six basic emotions: joy, sadness, anger, fear, disgust, and surprise (Figure 5(a)). These emotions were shown to be universally recognized across cultures, making them a cornerstone of emotion research. The Ortony, Clore, and Collins (OCC) model of Ortony et al. (Ortony et al., 1988) defines 22 discrete emotions and two cognitive states based on appraisals of events, actions, and objects. Izard’s (Izard, 2013) Differential Emotions Theory posits that ten discrete emotions are fundamental, innate components of human experience, each characterized by unique neural and expressive patterns. Apart from the above categorical frameworks, various affective RS approaches have developed their own, ad-hoc set of emotion categories based on their suitability for a particular application domain.
Categorical models of emotion have provided a strong theoretical basis for incorporating emotions into recommender systems. By mapping user behavior and content attributes to discrete emotional states, such as joy or sadness, systems can deliver personalized content that aligns with the user’s current affective state in an interpretable manner. Below we summarize illustrative affective RS approaches from each of the four major classes:
- •
Ekman’s Model. Leung et al.(Leung et al., 2020) classified emotions into happiness, sadness, anger, fear, disgust, surprise, and neutral using the Tweets Affective Classifier (TAC), a deep learning architecture featuring bidirectional LSTM and CNN layers, in order to adapt movie recommendations. Dodd et al.(Dodd et al., 2022) employed facial emotion recognition (FER) to detect happiness, sadness, anger, and other emotions from microexpressions, enabling emotionally adaptive systems. In another notable work, Deng et al. (Deng et al., 2015b) extracted emotional contexts from user microblogs, categorizing them into 2D, 7D, and 21D vectors linked to music preferences within specific time windows. Emotion-aware methods such as User-based Collaborative Filtering with Emotion (UCFE) incorporated emotional similarity into the recommendation process (Guo et al., 2019; Wu et al., 2016).
- •
OCC Model. Moshfeghi et al. (Moshfeghi et al., 2011) applied an OCC-based extractor to movie reviews and plot summaries, creating 22-dimensional binary vectors representing the five most frequent emotions per movie. User vectors were formed by summing emotion vectors of previously rated movies, weighted by rating scores, enabling affective similarity computation in collaborative filtering.
- •
Izard’s Model. Poirson and Cunha (Poirson and Cunha, 2019) collected emotion-oriented data by asking users to rate films on eight emotional attributes adapted from Izard’s Differential Emotions Scale (DES), alongside overall preference ratings. These emotion ratings were then used to compute user-to-user similarity, enabling the system to predict preferences for unrated items based on emotionally similar users.
- •
Ad-Hoc Categorical Models. In film music recommendation, the Music Affinity Graph (MAG) (Kuo et al., 2005) associated musical features like melody and rhythm with predefined emotion categories, including 15 groups such as joy, sadness, fear, anger, and gratitude. Random walk mechanisms refined these associations, ensuring emotionally aligned recommendations by penalizing features tied to unrelated emotions. The MIRROR (eMotIon on Reviews for RecOmmendeRsystems) framework by Meng et al. (Meng et al., 2018) integrated positive and negative emotions from user reviews using various emotion categories, incorporating emotional influence through global weighting and local regularization to enhance personalization. In conversational systems, Zhang et al. (Zhang et al., 2024b) considered nine emotion types (e.g., happy, negative, and surprise), fusing emotion signals with knowledge graphs to enable emotion-aware item recommendation and emotion-aligned response generation.
5.2. Dimensional Models of Emotion
Dimensional models offer researcher an alternative framework for understanding affective states by emphasizing continuous representations rather than discrete categories. One of the most widely adopted dimensional theories is Russell’s Circumplex Model (Russell, 1980; Russell and Bullock, 1985), which organizes emotions along two primary dimensions: valence (pleasure-displeasure) and arousal (activation-deactivation) (Figure 5(b)). This bidimensional space effectively captures the hedonic tone and energy levels associated with emotional states, allowing researchers to map a wide range of emotions onto a continuous plane. For instance, joy is characterized by high valence and high arousal, while sadness corresponds to low valence and low arousal. Plutchik (Plutchik, 1980) introduced a psychoevolutionary dimensional framework that organizes emotions in a circumplex structure. This model identifies eight primary bipolar emotions, such as joy versus sadness and anger versus fear, arranged in a three-dimensional space reflecting emotional intensity, polarity, and similarity. Unlike Russell’s model, Plutchik’s structure does not explicitly rely on valence and arousal but instead emphasizes the evolutionary and functional relationships among emotions, including their potential to blend into complex states. Another influential dimensional framework is the Pleasure-Arousal-Dominance (PAD) model by Mehrabian and Russell (Mehrabian and Russell, 1974), developed in the context of environmental psychology. The PAD model characterizes emotional responses along three continuous dimensions: pleasure, arousal, and dominance. Finally, while most dimensional approaches in affective recommender systems draw from established psychological theories, a subset of studies that we named Ad-Hoc adopt domain-specific or semantically constructed affective spaces.
Building on this theoretical foundation, dimensional models have found extensive applications in emotion-aware recommender systems, enabling dynamic tracking of users’ affective trajectories. Unlike categorical approaches, which classify emotions into discrete labels, dimensional models facilitate a more fluid and context-sensitive representation of emotional states, which enables recommender systems to model gradual transitions, such as shifts from excitement to relaxation or from calmness to agitation. Below we summarize illustrative affective RS approaches from each of the four major dimensional categories:
- •
Circumplex Model. Revathy et al. (Revathy et al., 2023) applied Russell’s Circumplex framework to map song lyrics into discrete emotion categories (e.g., happy, sad, angry, relaxed) by partitioning valence-arousal values and leveraging fine-tuned BERT embeddings to extract emotionally salient features, thereby improving music recommendation accuracy through a combination of semantic similarity and emotional mapping. Similarly, other recommender systems have adopted the valence-arousal model for real-time emotion detection using physiological signals such as Galvanic Skin Response (GSR) and Electrocardiogram (ECG) (Ayata et al., 2018), with CNN-based models trained on these inputs achieving superior predictive performance over traditional classifiers (Santamaria-Granados et al., 2019). Beyond conventional domains, this dimensional framework has also been applied to museum recommendations, where users’ valence-arousal states, combined with location tracking, were used to dynamically personalize visitor itineraries (Ferrato et al., 2022).
- •
Plutchik’s Model. Plutchik’s emotion theory inspired the Emotion-aware Transformer (EmoTER) for generating robust and fair explanations in recommender systems (Wen et al., 2022). EmoTER embedded emotional features using the NRC Emotion Lexicon and incorporated multi-task learning to align recommendations and explanations with users’ affective states. Orellana-Rodriguez et al. (Orellana-Rodriguez et al., 2015) utilized Plutchik’s eight basic emotions to generate emotion vectors from YouTube comments by mapping words through the NRC (National Research Council of Canada) Emotion Lexicon. These vectors are then integrated into a context-aware recommender system using collaborative ranking to deliver personalized film recommendations based on users’ emotional preferences.
- •
PAD Model. This framework has been widely adapted in affective computing, often using the term Valence-Arousal-Dominance (VAD), where ”Valence” replaces ”Pleasure.” Several recommender systems incorporate this model to align content with user emotions. In image recommendation, users’ emotions are inferred from facial expressions and mapped into the valence-arousal-dominance (VAD) space to generate affective classes, which are then used to enrich user profiles in a content-based recommender system for more emotionally aligned recommendations (Tkalcic et al., 2013). The Hangul font recommendation system from (Kim and Lim, 2018) extends the PAD model to PADO (adding an Organized–Free dimension), using crowdsourced evaluations to map fonts and user preferences into emotional space, enabling personalized font selection through distance-based matching.
- •
Ad-Hoc Dimensional Models. Benini et al. (Benini et al., 2011) proposed a movie recommender based on a semantic connotative space with three axes: natural (warm vs. cold), temporal (dynamic vs. slow), and energetic (impactful vs. minimal), derived from bipolar adjective pairs, reflecting stylistic and affective dimensions not explicitly tied to valence or arousal. Canini et al. (Canini et al., 2013) extended this work by predicting scene coordinates using audiovisual and film features from movie scenes via Support Vector Regression (SVR), removing the need for manual annotation.
Despite the widespread adoption of dimensional models, not all affective RS approaches accurately characterize their affective representations. For example, the study from (Yoon et al., 2012) mischaracterized Thayer’s model (Thayer, 1990) as a valence-arousal model; however, while the valence-arousal model is a well-established approach for emotion representation, Thayer’s model instead describes mood states along energy and tension axes, without directly categorizing affect in terms of positive or negative valence or arousal levels. This misrepresentation reflects a broader issue in affective computing, namely the conflation of mood and emotion models, which can lead to conceptual inconsistencies in emotion-aware recommendation design.
Overall, dimensional models offer a powerful foundation for emotion-aware recommender systems, enabling more fluid, context-sensitive, and personalized recommendations across diverse applications. Their adaptability across textual, physiological, and visual modalities provides a promising avenue for future developments in affective computing-driven personalization.
5.3. Categorical + Dimensional Models of Emotion
While categorical and dimensional models are often treated as distinct paradigms in affective computing, several works have explored hybrid approaches in affective RS that integrate the strengths of both, namely the interpretability of discrete emotion categories and the granularity of continuous affective dimensions.
- •
Circumplex + Ekman. EmoWare (Tripathi et al., 2019) integrates both categorical and dimensional emotion models by first positioning emotions within the 2D valence-arousal circumplex, a dimensional framework. From this space, it identifies emotional states like joy, fear, and sadness in different quadrants to define the emotional character of videos. These three discrete emotions, inspired by Ekman’s basic emotion theory, are then used as categorical labels to annotate content and track user responses, enabling a hybrid affective recommendation process.
- •
Geneva Emotion Wheel. The Geneva Emotion Wheel (GEW) (Sacharin et al., 2012) consists of 20 discrete emotion terms arranged around a circle and positioned such that the horizontal dimension indicates valence (negative to positive), while the vertical dimension indicates control or power (low to high). Kim and Hong (Kim and Hong, 2024) leveraged GEW in their Emotion-oriented Recommender System for Indoor Environmental Quality (ERS-IEQ), where the 20 emotion categories defined by GEW, such as joy, anger, fear, and interest, were inferred from multimodal data including facial expressions, voice, and physiological signals. A user’s emotional state is then encoded as a point in polar coordinates (R, ), combining categorical emotion types () with dimensional intensity levels (R).
Other affective recommender systems have similarly adopted hybrid approaches, combining dimensional scores with discrete emotion labels to enrich affective representations of users and items. For instance, in music recommendation, Deng et al. (Deng et al., 2015a) developed a hybrid system using a three dimensional framework Resonance-Arousal-Valence (RAV) and categorical emotion representations inspired by the OCC model. Acoustic features are processed via SVR and variational Bayesian models to predict RAV vectors, with emotion labels used for training. Gonzalez et al. (Gonzalez et al., 2007) developed a Smart Prediction Assistant (SPA) that recommends educational courses in an e-learning marketplace. SPA embeds emotional context via valence scores and categorical traits (e.g., hopeful, shy) derived from questionnaires, behavior, and demographics, and then pairs each course with a short, attribute-focused text tailored to the user’s dominant emotional traits.
5.4. Latent Representation of Emotion
Recent advances in emotion-aware recommender systems have explored latent representation of emotion, wherein affective cues are inferred from latent user behaviors rather than labeled emotion data. These approaches do not assign explicit emotion categories (e.g., joy or sadness) or map emotions onto predefined dimensions (e.g., valence, arousal). Instead, they extract emotion-inspired signals from patterns such as interaction histories, content preferences, or behavioral traits, and encode them into latent representations used within recommendation models. For example, Yousefian Jazi et al. (Yousefian Jazi et al., 2021) represent a user’s emotion state as a four dimensional Exponential Moving Average (EMA) vector over keyboard and mouse features: keystroke count, mean key hold time, mouse click count, and mean mouse button hold time. When a user selects a track, the contemporaneous EMA vector is stored with the track identifier together with an implicit one to five rating inferred from click, play, and download events, yielding a user-music log that pairs items with the user’s state at interaction time. Recommendations are then produced by comparing the current EMA vector to stored vectors using adjusted cosine similarity and a weighted sum predictor over neighbors, making the EMA vector the sole emotion representation throughout retrieval and ranking. Yin et al. (Yin et al., 2024) introduced Emotion-aware Implicit Matrix Factorization (EIMF), which integrates explicit ratings with latent “emotional” signals derived from implicit behaviors (clicks, views, purchases). Emotion is represented in low-dimensional user and item embeddings learned by the Implicit Matrix Factorization (IMF) component from one-hot behavior data, capturing global association patterns rather than psychologically defined emotions. The explicit ratings are modeled by the Emotion-aware Matrix Factorization (EMF) component, which maps users and items into high-dimensional embeddings whose inner products represent the explicit user-item evaluation captured from the rating data. These two embedding sets are fused via linear and nonlinear layers to generate final recommendation scores for movies, books, music, and images.
Overall, latent emotion representations offer a practical alternative to the development of emotion-aware recommender systems in real world settings where explicit emotion labels are scarce.
6. Mood-Aware Recommender Systems
Moods are prolonged affective states that influence user preferences and decision-making over time (Scherer, 2005). Unlike emotions, which are short-lived and linked to specific stimuli, moods persist for longer durations and can exert a more sustained influence on content consumption behaviors. Mood-aware recommender systems aim to enhance personalization by aligning recommendations with a user’s ongoing affective context extracted from signals such as behavioral cues (e.g., changes in browsing duration), physiological indicators (e.g., sustained elevation in skin conductance), contextual information (e.g., extended periods of rainy weather). Below we list representative mood-aware RS approaches from three major recommendation domains: music, food, and education.
- •
Mood Modeling in Music. Bontempelli et al. (Bontempelli et al., 2022) introduced Flow Moods, a mood-aware music recommendation system by Deezer that personalizes playlists based on six user-selected mood categories. Mood prediction, derived from curator-labeled data and audio embeddings, guided mood-filtered recommendations alongside collaborative filtering. Similarly, Chen et al. (Chen et al., 2016) proposed MoMusic, a hybrid system that infers mood from tempo and situation from lyrics using expert labels. Songs are matched to user preferences, collected through structured questionnaires, via rule-based filtering on features such as tempo and vocal style. Andjelkovic et al. (Andjelkovic et al., 2016) proposed MoodPlay, an interactive recommender that uses mood-aware filtering, audio similarity, and 2D mood-space visualization. Artist mood vectors are derived from Rovi metadata and organized using the Geneva Emotional Music Scale (GEMS) model. Users are represented by affective centroids, and recommendations are generated through profile- or trail-based exploration. In a follow-up study, Andjelkovic et al. (Andjelkovic et al., 2019) extended the system by introducing four interface variants with varying levels of affective control and experimentally showed that moderate interaction complexity best balances cognitive load, user satisfaction, and perceived recommendation quality.
- •
Mood Modeling in Education. In education, Tang et al. (Tang et al., 2021) developed a mood-adaptive recommender for online learners. Users are represented by feature vectors (e.g., degree, skills, preferences), and mood, categorized as positive, stable, or negative, guides recommendation strategy, such as suggesting challenging material for positive moods and simpler content for negative ones. However, the mood detection mechanism is unspecified.
- •
Mood Modeling in Recipe Recommendation. Ueda et al. (Ueda et al., 2016) proposed a recipe recommendation system that models user mood along six dimensions, body, mental, taste, time, price, and modification, based on a lexicon of 1,758 mood-related words. Recipes are annotated on a [-5, +5] scale for each dimension, and users specify their current mood using sliders. Recommendations are generated by ranking recipes based on similarity to the user’s mood profile, with missing annotations inferred via cosine similarity from labeled recipes.
Table LABEL:tab:mood lists representative mood-aware recommender system studies, organized according to their application domains, mood modeling strategies, and data sources.
7. Hybrid Affective Recommender Systems
Hybrid affective recommender systems integrate multiple types of affective states, such as sentiment, emotion, and mood, to leverage their strengths and mitigate the limitations of using a single type alone. By combining these different affective dimensions, such systems aim to enhance personalization, improve predictive accuracy, and increase overall user satisfaction. We categorize hybrid approaches in two main categories based on their methodological integration of affective state types: attitude + emotion, which combine long-term evaluative judgments with short-term affective reactions, and emotion + mood, which integrate emotional responses with mood to capture both momentary and enduring aspects of user preferences. Figure 6 represents the taxonomy for hybrid affective recommender systems. Table LABEL:tab:hybrid provides a structured list of representative papers in the two main categories, highlighting their targeted affective states, fusion strategies, application domains, and data sources.
7.1. Attitude + Emotion Informed Recommender Systems
Several studies combine attitude and emotion to model human affective states in recommender systems in order to capture nuanced affective interactions. For instance, Wang et al. (Wang et al., 2023) analyzed airline reviews in order to assess the impact of sentiment and emotion on recommendation intention. Positive sentiment, joy, and trust were found to increased the likelihood of recommendation, while negative sentiment, anger, and disgust had the opposite effect. Sertkan and Neidhardt (Sertkan and Neidhardt, 2022) proposed a news recommender system that integrates semantic and emotional signals from both articles and user behavior, based on sentiment and Ekman’s emotion taxonomy. Chen and Tang (Chen and Tang, 2018) introduced a music recommender that extracts sentiment polarity and arousal-valence (AV) features from lyrics using a Chinese sentiment lexicon. Songs are represented in affective space using emotion point matrices and matched to mood-specific AV regions for recommendation.
A notable subclass of emotion-attitude hybrid recommender systems targets serendipity, a well-established concept in RS literature that is not always explicitly framed in affective terms. Serendipity is commonly described as an unexpected yet pleasant experience. This characterization reveals its underlying affective structure: the unexpectedness component corresponds to the emotion of surprise, while pleasantness reflects a positive evaluative attitude toward the outcome. From this perspective, serendipity can be understood as a hybrid affective experience arising from the interplay between emotional arousal and attitudinal appraisal.
7.1.1. Cognitive and Affective Foundations of Serendipity
Modern perspectives emphasize that serendipity arises from a confluence of cognitive and emotional processes. Andel (Andel, 1994) argued that unsought findings occur when a well-prepared mind encounters an unforeseen stimulus and, through careful interpretation, transforms it into something valuable. Marg (Marg, 1995) demonstrated how emotional responses serve as somatic markers that guide decision-making and flag surprising events. In a similar vein, Isen (Isen, 2001) provided empirical evidence that positive affect not only enhances cognitive flexibility and fosters creative associations but also increases our sensitivity to unexpected stimuli. Foster and Ford (Foster and Ford, 2003) further supported the cognitive-affective perspective on serendipity in information seeking context by demonstrating that exploratory search behaviors can yield useful, unexpected information. More recently, Busch (Busch, 2024) highlighted that conditions such as agency, surprise, and value are essential for serendipitous outcomes, suggesting that individuals with rich prior experience and an emotionally engaged mindset are better equipped to encounter and leverage the unexpected. Serendipity emerges from the dynamic interplay between a prepared, flexible mind and the affective responses triggered by unexpected stimuli, with positive emotions facilitating creative insight and novel discoveries.
7.1.2. Serendipity in Information Retrieval and RS
The concept of serendipity in recommender systems originated from early work in information retrieval, where researchers observed that users occasionally encountered unexpectedly useful information while navigating large, unstructured datasets (Toms, 2000). Such studies highlighted the paradox of designing systems, because aiming to produce unexpected discoveries makes them expected, that deliberately facilitate unexpected discoveries rather than merely optimizing for expected information needs (Toms, 2000; Foster and Ford, 2003; McBirnie, 2008).
Herlocker et al. (Herlocker et al., 2004) informally define a serendipitous recommendation as one that helps the user find a ”surprisingly interesting item” that they might not have otherwise discovered, emphasizing surprise in delivering unexpected items. Building on this, subsequent studies have refined the definition in various ways. For instance, Iaquinta et al. (Iaquinta et al., 2008) proposed that an item might be considered serendipitous if a classifier is uncertain about its relevance, while Adamopoulos and Tuzhilin (Adamopoulos and Tuzhilin, 2014) argued that an item is serendipitous if it markedly deviates from the user’s established profile. Yet, despite these advances, there is still no consensus on the definition of serendipity in recommender systems. While many agree that serendipity should incorporate surprise, novelty, and utility, the relative importance of these components varies. Some researchers argue that for a recommendation to be serendipitous, it must be entirely novel, implying that the user is completely unaware of the item, whereas others contend that it is sufficient for the recommendation to be unexpected relative to the user’s current preferences, even if the user might eventually have discovered it independently (Kaminskas and Bridge, 2014).
Two distinct forms of serendipity have been notably observed in the literature, based on the source from which the serendipitous experience emerges:
- •
In discovery-level serendipity, users stumble upon entirely unfamiliar items, such as new genres, categories, or creators.
- •
In contrast, content-level serendipity arises not from how the item was found, but from unexpected features or characteristics encountered during its consumption. These features may relate to topics, style, or any other aspect of the item whose engaging or valuable nature becomes apparent only through direct experience.
Table LABEL:tab:serendipity lists representative papers on serendipity-aware recommender systems, highlighting their data sources, application domains, and techniques.
7.1.3. Discovery-Level Serendipity in RS
Discovery-level serendipity refers to user experiences where the recommended item is not have been actively sought by the user. This form of serendipity emphasizes the stumbling-upon effect, where users are exposed to items that lie outside their established preference boundaries yet turn out to be rewarding or useful. This conceptualization of serendipity is reflected in early foundational work on recommender systems. For example, Herlocker et al. (Herlocker et al., 2004) considers that “A serendipitous recommendation helps the user find a surprisingly interesting item he might not have otherwise discovered.” This framing highlights the role of unintentional discovery and positions serendipity as orthogonal to pure accuracy, underscoring the value of systems that promote exploration beyond familiar content. A similar conceptualization can be observed in the question below taken from a post-study questionnaire administered by Taramigkou et al. (Taramigkou et al., 2013) for their guided music-exploration system:
Post-study Question
“Did you find artists you wouldn’t have found easily on your own and which you would like to listen to from now on?”
In more recent work, Fu et al. (Fu et al., 2023b) constructed a ground-truth serendipity dataset by identifying reviews in which users explicitly mention stumbling upon an item. These reviews were collected through a crowd-sourced annotation process and reflect real-world accounts of unexpectedly discovering and enjoying an item. One example review from their dataset is shown below:
Example Review
“I stumbled on this book by accident, it was on an iPad I had borrowed and was checking out the features… but it grabbed my attention from the first page and I could not put it down. I recommended it to friends who loved it just as much. It is not my usual genre, but I loved it. It had everything—tragedy, suspense, romance, and an interesting backdrop for a story.”
This review illustrates the defining characteristics of discovery-level serendipity: the item lies outside the user’s routine preferences, the discovery is incidental, and the emotional outcome is strongly positive. Building on this dataset, subsequent work explored serendipity in a cross-domain recommendation setting, introducing a deep learning model designed to facilitate unexpected discoveries across the book and movie domains (Fu et al., 2024). In addition, several studies have explored discovery-level serendipity using only coarse signals like item IDs, user-item interactions, and popularity, without analyzing textual, visual, or other rich content features (de Gemmis et al., 2015; Li et al., 2020c, b; Lu et al., 2012; Afridi, 2018; Adamopoulos and Tuzhilin, 2014).
Collectively, these studies demonstrate that discovery-level serendipity, which stems from exposure to unfamiliar yet rewarding content, can be achieved through recommendation strategies that prioritize item or category level unfamiliarity.
7.1.4. Content-Level Serendipity in RS
Content-level serendipity refers to cases where users are recommended items that have internal features that are both unexpected and useful. Unlike discovery-level serendipity, which emphasizes stumbling upon unfamiliar items, content-level serendipity emerges from engaging with an item more deeply and uncovering surprising aspects that are not obvious before consuming the item, as illustrated in the user review below:
Example Review
“It seemed like just a basic travel camera, something light to carry around. But the moment I used the touch-to-focus and silent shutter mode, which blew me away, it felt like I was holding something far more premium. I didn’t expect something this compact to feel so refined and professional. I’m loving it more and more each day.”
In this example, the user’s initial expectations were determined by the product’s lightweight design and intended use case. These expectations were confounded upon using the item, when serendipity arose from unexpectedly discovering high-end features through direct interaction. This highlights how content-level serendipity is driven by internal characteristics that positively surprise the user during or after item consumption.
Hasan and Bunescu (Hasan and Bunescu, 2023) introduced a formal definition that equates content-based serendipity with the product between a user’s rating of an item and Bayesian surprise (Itti and Baldi, 2005) quantified as the Kullback-Leibler (KL) divergence between the prior (before consuming the item) and the posterior (after consuming the item) distributions over the user’s topic-level preferences. To recommend items with high potential for serendipity, the systems first uses a collaborative filtering approach to identify users who, at some prior point in their reading history, had similar topic-level preferences to the target user. Items with high serendipity scores from the most similar users are then recommended to the target user. Other RS approaches to content-level serendipity, such as (Niu et al., 2018; Jenders et al., 2015), quantify surprise more directly as a semantic dissimilarity between the recommended item and a user’s historical preference profile. When topic modeling is used to model content, dissimilarity can be quantified as the KL divergence between the candidate item and the user’s historical topic distributions (Huang et al., 2018). Apart from these works, the user study of Kotkov et al. (Kotkov et al., 2018) explicitly evaluates both content‑level and discovery-level serendipity, asking participants separate questions about whether the recommended item was something they would not normally discover and whether its style, genre, or topic differed markedly from their typical choices.
Together, these approaches underscore that content-level serendipity arises not merely from unfamiliarity, but from meaningful deviations within an item’s content or features that defy user expectations in rewarding ways, which highlights the importance of modeling rich item information.
7.2. Emotion + Mood Informed Recommender Systems
Integrating information about transient states (emotions) with more sustained states (moods) allows recommender systems to adapt to both short-term affective contexts and more persistent preference patterns. An example in this category is the MusicSense framework (Cai et al., 2007), which aims to match the emotions and moods expressed by the music with those identified in web pages that the user is reading. Based on a generative probabilistic model inspired by LDA, both music and textual content are represented as mixtures of emotions and moods. The similarity between music and web content is assessed using KL divergence, resulting in recommendations matching emotional experiences and mood tendencies. In a related study, Piazza et al. (Piazza et al., 2017) leveraged moods measured via the Positive and Negative Affect Schedule (PANAS) alongside emotions captured through the Pleasure-Arousal-Dominance (PAD) model within a factorization machines framework. Their findings indicated that mood features significantly enhanced predictive accuracy, especially in cold-start scenarios, whereas emotions were found to introduce noise due to their transient nature, diminishing their predictive utility for product evaluation. Moscato et al. (Moscato et al., 2021) developed an emotion-aware music recommender that leverages song-derived emotions to continuously update the user moods represented within a reduced PAD space, while item recommendation is performed by identifying the nearest neighbors to the user’s current mood. Dhahri et al. (Dhahri et al., 2018) proposed a mood-aware music recommender that infers user mood (positive, negative, or neutral) from social media cues and combines it with adaptive song embeddings in a 2D latent space to deliver personalized recommendations without requiring explicit input or listening history. Mood-specific song relevance is computed via cosine similarity between user and song emotion vectors, refined through reinforcement learning, and updated using change-point detection.
8. Datasets and Applications
Affective recommender systems leverage a wide variety of datasets and have been applied across multiple domains to enhance user experience. Unlike traditional RS, which primarily rely on explicit user feedback (e.g., ratings, clicks, purchase history), affective RS incorporate emotion, mood, and sentiment-related information from various data sources. These include physiological signals (e.g., EEG, ECG, GSR), text, multimodal interactions, and behavioral cues that help infer users’ affective states. This section explores the key datasets used in affective RS and provides an overview of their applications across different domains.
8.1. Datasets for Affective Recommender Systems
Affective RS datasets originate from various sources, including social media interactions, movie reviews, music streaming histories, fashion preferences, and conversational dialogues. Some datasets are explicitly designed for affective recommendations, while others are adapted from general recommender system datasets with additional affective annotations. Table 7 provides an overview of key datasets for affective RS, covering multiple domains such as short video recommendations, social media, e-commerce, fashion, music, healthcare, and news.
These datasets illustrate the growing use of affective information across recommendation models in various domains. Nevertheless, the subjective nature of affective states introduces annotation challenges, often leading to inconsistent emotion labels (Poria et al., 2019). Additionally, most existing datasets are limited to single-modality signals, which is limiting for multimodal affective modeling. Consequently, ideas for future research include the development of large-scale multimodal datasets that integrate diverse sources such as textual reviews, contextual data, vision, audio, and physiological signals. Leveraging self-supervised learning and weakly supervised annotation techniques can further facilitate creating comprehensive affect-rich datasets. Tackling these data challenges can help affective RS to more accurately adapt to users’ dynamic affective states, providing more personalized and ultimately more satisfying recommendations.
8.2. Applications of Affective Recommender Systems
Affective recommender systems are increasingly being adopted across many domains. Fig. 7 presents the primary domains and their respective sub-domains where affective RS have been notably applied.
- •
E-commerce. In e-commerce, affective RS analyze user sentiment from product reviews, emotional feedback, and purchase behavior to personalize recommendations in areas such as electronics, fashion, and books (Aramanda et al., 2023; Lin et al., 2021; Cai et al., 2022; Meng et al., 2018).
- •
Education. Educational platforms integrate affective signals from social media and micro-blogs to dynamically personalize learning materials, including course recommendations, research articles, and educational books, helping students stay engaged and motivated based on their emotional states (N. and K.M., 2023; Olga C. Santos and Rodriguez-Sanchez, 2016; M et al., 2023; Caglar-Ozhan et al., 2022).
- •
Healthcare. In the healthcare sector, affective RS play a vital role in tailoring health-related suggestions, such as personalized diet plans, hospital and drug recommendations, and useful medical articles. By recognizing patterns of emotional distress, these systems support improved health outcomes and emotional well-being (Santamaria-Granados et al., 2019; Serrano-Guerrero et al., 2024; Shi et al., 2022).
- •
Music. Music streaming services employ emotion-aware algorithms to adjust playlists and recommend songs, artists, or concerts that resonate with a user’s current mood, enriching their listening experience (Sasaki et al., 2013; Kuo et al., 2005; Deng et al., 2015a; Revathy et al., 2023; Han et al., 2024).
- •
Tourism. Tourism recommendation systems incorporate affective insights from user reviews and multimodal interactions to suggest personalized travel experiences, including restaurant choices, hotel accommodations, airline services, and travel destinations (Artemenko et al., 2020; Yang et al., 2013; Ho et al., 2012; Ferrato et al., 2022; Asani et al., 2021).
- •
Video. In video streaming services, affective RS adapt movie, TV show, and short video recommendations based on users’ emotional preferences and past viewing behavior, ensuring a more engaging and immersive content selection process (Zhang et al., 2024a; Zhao et al., 2011; Shepstone et al., 2014).
- •
Social Media. Social media platforms utilize affective engagement patterns, such as sentiment in posts, reactions, and interactions, to recommend potential friends, communities, and interest-based groups (Wu et al., 2016; Akiyama et al., 2017; Dwivedi-Yu et al., 2022).
- •
News. News recommender systems dynamically adapt content delivery based on users’ emotional reactions to political, financial, and global events, ensuring engagement while mitigating content fatigue or emotional overload (Yun et al., 2023; Tao and Alatas, 2024; Mizgajski and Morzy, 2019).
9. Future Research Directions and Open Issues
Although significant progress has been made in sentiment-aware, emotion-aware, and mood-aware recommender systems, several open issues hinder the development of more robust and comprehensive affective models. The primary challenges include the lack of hybrid models that integrate multiple affective states, the scarcity of high-quality datasets for affective recommendations, the conflation of sentiment, emotion, and mood in computational models, and the need for multimodal affective learning.
9.1. Towards Hybrid Affective Recommender Systems
One of the most significant gaps in affective RS is the lack of hybrid models that effectively integrate multiple affective states. Existing work focuses primarily on individual affective dimensions, such as sentiment, emotion, or mood, without exploring the potential synergies between them. Sentiment analysis captures stable, long-term attitudes, emotion detection focuses on transient and intense responses, whereas mood modeling accounts for more prolonged but diffuse affective states. While these distinct types of affective states interact dynamically in human decision-making, current recommender systems have largely ignored these interactions, limiting their ability to provide a well-rounded affect-aware recommendation experience.
- •
Unified Affective Modeling. A promising direction for future research is to develop hybrid affective models that encode sentiment, emotion, and mood into latent spaces while capturing their interactions within a unified framework. Deep learning architectures, such as hierarchical attention networks, graph neural networks (GNNs), and transformer-based models, could be leveraged to capture the sometimes subtle relationships between different affective states. Multi-task learning could also be explored, where a single model is trained to predict multiple affective states simultaneously, improving the robustness of affective representations. Furthermore, designing adaptive recommender systems that dynamically adjust their reliance on sentiment, emotion, or mood based on the users’ behavioral patterns and contextual factors could lead to more nuanced and personalized recommendations.
9.2. Towards Broad-Coverage Affective Datasets
The development of affective RS is severely constrained by the lack of comprehensive, large-scale datasets that include explicit affective labels. While traditional recommender systems benefit from extensive datasets containing user ratings, clicks, and purchase histories, datasets that incorporate sentiment, emotion, and mood annotations remain scarce. Many existing studies rely on manually labeled affective datasets, which are limited in scale and fail to capture the full spectrum of real-world affective states. Additionally, most affective datasets focus on only one type of affective state, making it difficult to develop hybrid models that integrate sentiment, emotion, and mood.
To address these limitations, future research includes the creation and annotation of large-scale datasets that capture a wide range of affective states. This includes data from user-generated content (e.g., reviews, tweets, blogs), physiological signals (e.g., heart rate, EEG, facial expressions), multimodal inputs (e.g., facial expressions, posture, gaze tracking, object and scene affective analysis), and contextual metadata (e.g., time of day, activity levels, and social interactions). The following techniques offer promising directions for affective dataset development:
- •
Weak, Distant, and Self-Supervised Learning. These techniques help to mitigate data scarcity by generating approximate labels from unstructured data. Weak supervision can incorporate limited high-quality labeled data alongside noisy heuristic-based annotations. Distant supervision can infer affective states using external sources such as hashtags, sentiment lexicons, or emotion-tagged multimedia. Self-supervised learning can extract structured representations from unlabeled data, such as affective embeddings from video and audio interactions, which can be fine-tuned using small labeled datasets.
- •
Transfer Learning and Domain Adaptation. Pre-trained models can be repurposed across modalities to enhance affective modeling. For instance, facial expression recognition models trained on large video datasets can be fine-tuned for emotion-aware movie recommendations. Similarly, sentiment embeddings derived from textual reviews can be adapted to personalize image-based content, enabling seamless multimodal integration. For example, if a user’s reviews indicate a preference for uplifting, positive content, these sentiment signals could guide a recommender to suggest photos, posters, or other visual media with similar affective characteristics.
- •
Crowdsourced Annotations. Leveraging human annotators through platforms like Amazon Mechanical Turk or Prolific enables the scalable collection of affective labels across diverse demographics. This approach supports the creation of more representative and contextually rich datasets by incorporating subjective human judgment at scale.
- •
Synthetic Data Generation. Large language models and generative models (e.g., GANs) can be used to produce synthetic text, speech, or images enriched with affective content. These synthetic datasets can augment underrepresented affective categories and facilitate balanced model training.
9.3. Addressing the Conflation of Sentiment, Emotion, and Mood
A persistent theoretical challenge in affective RS is the conflation of sentiment, emotion, and mood. While these affective states have well-defined distinctions in psychology, many computational models fail to differentiate them adequately, leading to conceptual inconsistencies and ensuing suboptimal performance. Sentiment reflects long-term evaluative attitudes, emotions are short-lived and intense responses to specific stimuli, and moods are more diffuse affective states that persist over extended periods. However, a few existing studies blur these distinctions.
- •
Sentiment-Emotion Conflation. Wu (Wu, 2024) termed a diverse set of emotions (which are short-term affective states), such as awe, anger, amusement, sadness, fear, disgust, excitement, and contentment, as sentiment (which represent a stable attitude).
- •
Emotion-Mood Conflation. Yoon et al. (Yoon et al., 2012) misattributed Thayer’s mood model as a variant of the valence-arousal framework, mapping emotions such as anger, happiness, sadness, and peacefulness to musical features based on valence and arousal dimensions. In reality, Thayer’s model conceptualizes mood using orthogonal dimensions of energy and tension, which differ fundamentally from valence-arousal axes.
We recommend anchoring descriptions in a more precise vocabulary of affective terms, such as Scherer’s typology of affective states(Scherer, 2005) augmented with the Geneva Wheel of Emotions (Sacharin et al., 2012).
9.4. Multimodal Affective Learning
Human affective experiences are inherently multimodal, encompassing text, speech, facial expressions, physiological responses, and behavioral cues. However, most existing affective RS predominantly rely on text-based sentiment analysis or explicit user feedback, overlooking the wealth of multimodal signals that can provide a more nuanced and comprehensive understanding of user affect. This reliance on a single modality limits the accuracy and adaptability of affect-aware recommendations, as textual signal alone may not fully capture the depth and dynamics of users’ emotional and mood states. To move beyond this limitation, we highlight two key research directions:
- •
Building Multimodal Affective RS Frameworks. Developing new affective recommender systems that integrate multimodal signals poses several challenges, including the fusion of heterogeneous data sources, robustness to incomplete or noisy inputs, and the need for scalable architectures capable of processing high-dimensional multimodal data. Future work should focus on constructing frameworks that combine textual, visual, auditory, and physiological inputs to improve both affective signal recognition and recommendation quality.
- •
Using Existing Multimodal Tools and Models. Recent advances in multimodal fusion, such as attention-based fusion networks (Vaswani et al., 2017), cross-modal transformers (Yan et al., 2023), and contrastive learning methods (Khosla et al., 2020), offer practical tools for affect modeling. Additionally, large-scale pretrained models like CLIP (Radford et al., 2021) and diffusion models (Ho et al., 2020) enable the learning of joint representations of affective states across multiple modalities. Leveraging these existing tools can enhance the robustness, adaptability, and emotional awareness of affective recommender systems beyond text-based sentiment analysis.
10. Conclusion
Affective recommender systems represent a significant advancement in personalization, by incorporating affective states such as sentiment, emotion, and mood. This survey presents a comprehensive and structured review of the field, grounded in a social psychology perspective of emotion, particularly Scherer’s typology of affective states. Accordingly, the survey organizes affective RS publications into four main categories: attitude-aware, emotion-aware, mood-aware, and hybrid systems. Through the analysis of over 200 studies across various application domains, we have synthesized methodological trends in affective signal extraction and integration strategies, offering a unified perspective on how affect can enhance personalization in recommender systems. In doing so, we have identified 3 major research areas that are important for further progress: (1) modeling and combining different types of affective states together in affective RS; (2) the creation of large-scale datasets annotated with rich affective information, and (3) the development of robust hybrid models that account for the interplay between multiple affective dimensions. Addressing these challenges can lead to more robust, adaptive, and human-aligned recommender systems that not only improve accuracy and engagement, but also foster emotionally intelligent and empathetic user experiences. This survey aims to provide a comprehensive resource for researchers and practitioners interested in understanding the current state of knowledge as well as in driving future innovations in affective recommender systems.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Abbas and Niu (2019) Fakhri Abbas and Xi Niu. 2019. Computational Serendipitous Recommender System Frameworks: A Literature Survey. In 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) . 1–8. https://doi.org/10.1109/AICCSA 47632.2019.9035339 · doi ↗
- 3Abbasi-Moud et al. (2021) Zahra Abbasi-Moud, Hamed Vahdat-Nejad, and Javad Sadri. 2021. Tourism recommendation system based on semantic clustering and sentiment analysis. Expert Systems with Applications 167 (2021), 114324. https://doi.org/10.1016/j.eswa.2020.114324 · doi ↗
- 4Adamopoulos and Tuzhilin (2014) Panagiotis Adamopoulos and Alexander Tuzhilin. 2014. On Unexpectedness in Recommender Systems: Or How to Better Expect the Unexpected. ACM Trans. Intell. Syst. Technol. 5, 4, Article 54 (Dec. 2014), 32 pages. https://doi.org/10.1145/2559952 · doi ↗
- 5Aditya (2025) Ramit Aditya. 2025. Engineering Serendipity through Recommendations of Items with Atypical Aspects . Master’s thesis. The University of North Carolina at Charlotte.
- 6Adru and Johnson (2024) Sri Likhita Adru and Sandra Johnson. 2024. Harmonizing Emotions: A Fusion of Facial Emotion Recognition and Music Recommendation System. In 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence) . 464–469. https://doi.org/10.1109/Confluence 60223.2024.10463390 · doi ↗
- 7Afridi (2018) Ahmad Hassan Afridi. 2018. User Control and Serendipitous Recommendations in Learning Environments. Procedia Computer Science 130 (2018), 214–221. https://doi.org/10.1016/j.procs.2018.04.032 · doi ↗
- 8Afridi et al. (2020) Ahmad Hassan Afridi, Ansar Yasar, and Elhadi M Shakshuki. 2020. Facilitating research through serendipity of recommendations. Journal of Ambient Intelligence and Humanized Computing 11 (2020), 2263–2275.
