How much do you perceive this? An analysis on perceptions of geometric   features, personalities and emotions in virtual humans (Extended Version)

Victor Araujo; Rodolfo Migon Favaretto; Paulo Knob; Soraia; Raupp Musse; Felipe Vilanova; Angelo Brandelli Costa

arXiv:1904.11084·cs.GR·April 26, 2019

How much do you perceive this? An analysis on perceptions of geometric features, personalities and emotions in virtual humans (Extended Version)

Victor Araujo, Rodolfo Migon Favaretto, Paulo Knob, Soraia, Raupp Musse, Felipe Vilanova, Angelo Brandelli Costa

PDF

Open Access

TL;DR

This study investigates how people perceive geometric features, personalities, and emotions in virtual humans, showing that participants can accurately perceive these attributes without prior explanation, based on a dataset of pedestrian videos.

Contribution

The paper introduces an analysis method for perception of geometric, personality, and emotion features in virtual humans using a dataset with ground truth annotations.

Findings

01

Participants perceived personality and emotion accurately without prior explanation.

02

Geometric features like distances and speeds were reliably perceived.

03

Perception aligned with ground truth in most cases.

Abstract

This work aims to evaluate people's perception regarding geometric features, personalities and emotions characteristics in virtual humans. For this, we use as a basis, a dataset containing the tracking files of pedestrians captured from spontaneous videos and visualized them as identical virtual humans. The goal is to focus on their behavior and not being distracted by other features. In addition to tracking files containing their positions, the dataset also contains pedestrian emotions and personalities detected using Computer Vision and Pattern Recognition techniques. We proceed with our analysis in order to answer the question if subjects can perceive geometric features as distances/speeds as well as emotions and personalities in video sequences when pedestrians are represented by virtual humans. Regarding the participants, an amount of 73 people volunteered for the experiment. The…

Tables1

Table 1. Table 2. Videos of the Cultural Crowds (Favaretto et al . , 2016b ) dataset.

Video	Country	N. Pedestrian	Density
AE-01	Unit. Arab Emirates	12	Low
AT-03	Austria	10	Low
BR-01	Brazil	16	Low
BR-15	Brazil	15	Low
BR-25	Brazil	25	Medium
BR-34	Brazil	34	High

Equations2

Animation=\left\{\begin{array}[]{ll}\textbf{Idle},&\mbox{when $s_{i}==0$};\\ \textbf{Walk},&\mbox{when $0<s_{i}<\frac{0.08m}{f}$};\\ \textbf{Run},&\mbox{when $s_{i}\geq\frac{0.08m}{f}$}.\end{array}\right.

Animation=\left\{\begin{array}[]{ll}\textbf{Idle},&\mbox{when $s_{i}==0$};\\ \textbf{Walk},&\mbox{when $0<s_{i}<\frac{0.08m}{f}$};\\ \textbf{Run},&\mbox{when $s_{i}\geq\frac{0.08m}{f}$}.\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Color perception and design · Visual Attention and Saliency Detection

Full text

How much do you perceive this? An analysis on perceptions of geometric features, personalities and emotions in virtual humans (Extended Version)

Victor Araujo

,

Rodolfo Migon Favaretto

,

Paulo Knob

,

Soraia Raupp Musse

Graduate Program in Computer Science

Pontifical Catholic University of Rio Grande do SulPorto Alegre - RSBrazil

,

Felipe Vilanova

and

Angelo Brandelli Costa

Graduate Program in Psychology

Pontifical Catholic University of Rio Grande do SulPorto Alegre - RSBrazil

Abstract.

This work aims to evaluate people’s perception regarding geometric features, personalities and emotions characteristics in virtual humans. For this, we use as a basis, a dataset containing the tracking files of pedestrians captured from spontaneous videos and visualized them as identical virtual humans. The goal is to focus on their behavior and not being distracted by other features. In addition to tracking files containing their positions, the dataset also contains pedestrian emotions and personalities detected using Computer Vision and Pattern Recognition techniques. We proceed with our analysis in order to answer the question if subjects can perceive geometric features as distances/speeds as well as emotions and personalities in video sequences when pedestrians are represented by virtual humans. Regarding the participants, an amount of 73 people volunteered for the experiment. The analysis was divided in two parts: i) evaluation on perception of geometric characteristics, such as density, angular variation, distances and speeds, and ii) evaluation on personality and emotion perceptions. Results indicate that, even without explaining to the participants the concepts of each personality or emotion and how they were calculated (considering geometric characteristics), in most of the cases, participants perceived the personality and emotion expressed by the virtual agents, in accordance with the available ground truth.

User perception, geometric features, personalities, emotion

††conference: Intelligent Virtual Agents; July 02–05, 2019; Paris, France

1. Introduction

The study of human behavior is a subject of great scientific interest and probably an inexhaustible source of research (Jacques Junior et al., 2010). Due to its importance in many applications, the automatic analysis of human behavior has been a popular research topic in the last decades (Alameda-Pineda et al., 2018). In literature, there are some work involving the visualization and analysis of cultural characteristics, such as analysis of the impact of groups on crowds through human perceptions (Yang et al., 2018), simulation of crowds through behaviors based on personality and emotions traits (Durupınar et al., 2016), visualization of interactions between virtual agents in crowd simulation and pedestrians in real video sequences (Knob et al., [n. d.]), visualization of personality traits through social media (Gou, [n. d.]), visualization and understanding of personal emotional style (Zhao et al., 2014), visualization of personal records (Plaisant et al., 1996) among others. Typically, these approaches deal with Natural Language Processing (NLP) and extractions of social media data (analysis of feelings), criminal and medical records, or any other record extracted from textual data.

Recently, studies have used geometric features to analyze cultural aspects in crowds. Favaretto et al. (Favaretto et al., 2016b) used group behaviors to detect cultural aspects according to Hofstede (Hofstede, 2001). In other investigations, Favaretto et al. investigated cultural aspects using controlled experiment videos (related to Fundamental Diagram (Chattaraj et al., 2009)) and spontaneous videos from various countries, using geometrical features (Favaretto et al., 2016a), Big-Five personality (Favaretto et al., 2017) and OCC emotion (Favaretto et al., 2018) models. However, there are not many methods in the literature that investigate people’s perceptions regarding geometric information (Yang et al., 2018). In this sense, the objective of this work is to investigate how people perceive the geometric characteristics (for example, density data, distances and velocities) and non geometric characteristics (for example, cultural characteristics such as personality traits and emotions) calculated from the geometric features of pedestrians from videos of crowds. For this, we use the videos of Cultural Crowds111(Available at: http://rmfavaretto.pro.br/vhlab/) dataset (Favaretto et al., 2016b), which contains videos of crowds from different countries, with pedestrians walking in different scenarios. Therefore, the dataset contains the tracking files with the pedestrian positions and provides also personality and emotion information of these pedestrians, which was obtained through Computer Vision and Pattern Recognition techniques.

For the experiment, we use the track position in a simulated environment where agents were visualized as identical virtual humans. The goal is to focus on their behavior and not being distracted by other features. In our analysis the participants were asked to answer questions to identify if they can perceive geometric features as distances/speeds as well as emotions and personalities in video sequences when pedestrians are represented by virtual humans. In particularly, and very important to this work, is understand that our focus is on perception of information always related to the space and geometry, even when we talk about emotion and personality, we are interested about the pure geometric manifestations (like distance among agents, speeds and densities). The main motivation is to evaluate the area of personality and emotion detection in video sequences, i.e. we want to know if people perceive qualitatively what can be detected in video sequences.

2. Related Work

This section discusses some work related to pedestrian and crowds behavioral analysis focusing on personality traits, emotion and perception.Knob et al. (Knob et al., [n. d.]) presented a work related to visualizations of interactions between pedestrians in video sequences and virtual agents in crowd simulations. Interactions are given by factors based on the OCEAN of each pedestrian and agent. The OCEAN (Digman, 1990; John, 1990) is the personality trait model most commonly used for this type of analysis, also referenced as Big-Five: Openness to experience (“the active seeking and appreciation of new experiences”); Conscientiousness (“degree of organization, persistence, control and motivation in goal directed behavior”); Extraversion (“quantity and intensity of energy directed outwards in the social world”); Agreeableness (“the kinds of interaction an individual prefers from compassion to tough mindedness”); Neuroticism (“how much prone to psychological distress the individual is”) (Lord, 2007). Durupinar et al. (Durupınar et al., 2016) also used OCEAN to visually represent personality traits. Agents’ visual representation is given in several ways, for example, the animations of the agents are based on these two cultural characteristics (OCEAN and emotion). If an agent is sad, his/her animation will represent that emotion. Yang et al. (Yang et al., 2018) conducted a study on analysis perception to determine the impact of groups at various densities, using two points of view: top and first-person view. In addition to this perception, they looked at what type of camera position (top view or first-person view) could be better for the perception of density. The work of Ardeshir and Borji (Ardeshir and Borji, 2018) shows experiments and graphs made between two points of view (first-person and top cam view), thus helping in the integration and use of the types of cameras used in this present work.

Regarding the detection of personalities, emotion and cultural aspects in pedestrian from crowds, (Favaretto et al., 2016b) proposed a method to identify groups and characterize them to assess the aspects of cultural differences through the mapping of the Hofstede’s dimensions (Hofstede, 2011). A similar idea, however using computer simulation and not focused on computer vision, is proposed by Lala et al. (Lala et al., 2011). They use Hofstede’s dimensions to create a simulated crowd from a cultural perspective. Gorbova and collaborators (Gorbova et al., 2017) present a system of automatic personality screening from video presentations in order to make a decision whether a person has to be invited to a job interview based on visual, audio and lexical cues. The work proposed by (Favaretto et al., 2017), presents a model to detect personality aspects based on the Big-five personality model using individuals behaviors automatically detected in video sequences.

Several models have been developed to explain and quantify basic emotions in humans. One of the most cited is proposed by Paul Ekman (Ekman and Friesen, 1971) which considers the existence of 6 universal emotions based on cross-cultural facial expressions (anger, disgust, fear, happiness, sadness and surprise). In (Favaretto et al., 2018), the authors proposed a way to detect pedestrian emotions in videos, based on OCC emotion model. To detect the emotions of each pedestrian, the authors used OCEAN as inputs, as proposed by Saifi (Saifi et al., 2016). In our approach, we proceed with an analysis in order to verify if subjects can perceive geometric features as distances/speeds as well as emotions and personalities in video sequences when pedestrians are represented by virtual humans. Next section present how we performed the analysis.

3. Methodology

The main goal of this work is to analyze the perceptions of people about geometric data (speed, distance, density and angular variation), personality and emotions. The data were extracted from the Cultural Crowds dataset (Favaretto et al., 2016b). The geometric data are calculated using the pedestrian trajectories. Personality and emotion traits are also calculated based on that, through psychological hypotheses. Next sections detail these processes.

3.1. Features extraction

Based on the tracking input file, Favaretto et al. (Favaretto et al., 2017) compute information for each pedestrian $i$ at each timestep: i) 2D position $x_{i}$ (meters); ii) speed $s_{i}$ (meters/frame); iii) angular variation $\alpha_{i}$ (degrees) w.r.t. a reference vector $\vec{r}=(1,0)$ ; iv) isolation level $\varphi_{i}$ ; v) socialization level $\vartheta_{i}$ ; and vi) collectivity $\phi_{i}$ . To compute the collectivity affected in individual $i$ from all $n$ individuals, they computed $\phi_{i}=\sum_{j=0}^{n-1}\gamma e^{(-\beta\varpi(i,j)^{2})}$ , and the collectivity between two individuals was calculated as a decay function of $\varpi(i,j)=s(s_{i},s_{j}).w_{1}+o(\alpha_{i},\alpha_{j}).w_{2}$ , considering $s$ and $o$ respectively the speed and orientation differences between two people $i$ and $j$ , and $w_{1}$ and $w_{2}$ are constants that should regulate the offset in meters and radians.

To compute the socialization level $\vartheta$ , Favaretto et al. (Favaretto et al., 2016a) use an artificial neural network (ANN) with a Scaled Conjugate Gradient (SCG) algorithm in the training process to calculate the socialization $\vartheta_{i}$ level for each individual $i$ . The ANN has 3 inputs (collectivity $\phi_{i}$ of person $i$ , mean Euclidean distance from a person $i$ to others $\bar{d_{i,j}}$ and the number of people in the Social Space222Social space is related to $3.6$ meters (C. S. Hall and Campbell, 1998). according to Hall’s proxemics (C. S. Hall and Campbell, 1998) around the person $n_{i}$ ). The isolation level corresponds to its inverse, $\varphi_{i}=1-\vartheta_{i}$ . For more details about how this features are obtained, please refer to (Favaretto et al., 2017; Favaretto et al., 2016a). For each individual $i$ in a video, we computed the average for all frames and generate a vector $\vec{V_{i}}$ of extracted data where $\vec{V_{i}}=\left[x_{i},s_{i},\alpha_{i},\varphi_{i},\vartheta_{i},\phi_{i}\right]$ . In the next section we describe how these features are mapped into personality and emotion traits.

3.2. Personality and emotion detection

To detect the five dimensions of OCEAN for each pedestrian, (Favaretto et al., 2017) used the NEO PI-R (Costa and McCrae, 1992) that is the standard questionnaire measure of the Five Factor Model. They firstly selected NEO PI-R items related to individual-level crowd characteristics and the corresponding OCEAN-factor. For example: ”Like being part of crowd at sporting event” corresponding to the factor ”Extroversion”. As described in details in (Favaretto et al., 2017), they proposed a series of empirically defined equations to map pedestrian features to OCEAN dimensions. Firstly, they selected 25 from the 240 items from NEO PI-R inventory that had a direct relationship with crowd behavior. In order to answer the items with data coming from real video sequences, they proposed equations that could represent each one of the 25 items with features extracted from videos. For example, in order to represent the item “1 - Have clear goals, work to them in orderly way”, Favaretto and his colleagues consider that the individual $i$ should have a high velocity $s$ and low angular variation $\alpha$ to have answer in concordance with this item. So the equation for this item was $Q_{1}=s_{i}+\frac{1}{\alpha_{i}}$ . In this way, they empirically proposed equations for all the 25 items, as presented in (Favaretto et al., 2017).

In the work presented by (Favaretto et al., 2018), the authors proposed a way to map OCEAN dimensions of each pedestrian in OCC Emotion model, regarding four emotions: Anger, Fear, Happiness and Sadness. This mapping is described in Table 1. In Table 1, the plus/minus signals along each factor represent the positive/negative value of each one. For example concerning Openness, O+ stands for positive values (i.e. O $\geq$ 0.5) and O- stands for negative values (i.e. O $<$ 0.5)). A positive value for a given factor (i.e. 1) means the stronger the OCEAN trait is, the stronger is the emotion too. A negative value (i.e. -1) does the opposite, therefore, the stronger the factor’s value, the weaker is a given emotion. A zero value means that a given emotion is not affected at all by the given factor.

3.3. Features visualization

The viewer was developed using the Unity3D333Unity3D is available at https://unity3d.com/ engine, with $C\#$ programming language. The viewer allows the users to rewind, accelerate and stop the simulated video through a time controller, so that the user can observe something that he/she finds interesting several times, at any time. Figure 1 shows the main window of the viewer. As identified in Figure 1, the viewer is divided in five parts, as follows: 1) time controller, where is possible to start, stop and continue simulation playback; 2) buttons ChangeScene and RestartCamPos to, respectively, load the data file of another video and restart the camera position for viewing in first person; 3) a window that shows the top view of the environment; 4) the first-person view of a previously selected agent (this agent is highlighted in area 3) and 5) that contains features panel, where the users can activate the visualization of the data related to the emotion, socialization and collectivity of agents.

This viewer has three modes of visualization: (i) first-person visualization, (ii) top view, and (iii) an oblique view. Figure 2 shows an example of each type of camera point of view in a video available in the Cultural Crowds dataset. In addition to these different points of view, it is possible to observe all the pedestrians present in each frame $f$ . Pedestrians can be represented by an humanoid or cylinder type avatar. Each pedestrian $i$ present in frame $f$ has a position ( $X_{i},Y_{i}$ ) (already converted from image coordinates to world coordinates). In addition to the positions, it is also possible to know if the pedestrian is walking, running or stopped in frame $f$ through the current speed $s_{i}$ . If in this frame the current speed is greater than or equal to $\frac{0.08m}{f}$ which is equivalent to $\frac{2m}{s}$ , considering $\frac{24f}{s}$ , then the avatar is running. It was defined based on the Preferred Transition Speed PTS (Alexander, 1992). The values of the transitions can be seen in Equation 1, considering the current speed of the agent $s_{i}$ .

[TABLE]

Also, for the humanoid avatar type, each speed transition is accompanied by an animation transition, for example, if the current speed $s_{i}==0$ , then it does not change the animation (remaining stationary), but if its speed is $0<s_{i}<\frac{0.08m}{f}$ , then the animation changes for walking as well as if $s_{i}\geq\frac{0.08m}{f}$ , the animation of the avatar changes to running. Next section presents some obtained results.

4. Results

This section aims to present the results of people’s perceptions about geometric data information (density, speed, distance between pedestrians and angular variation), personalities and emotions. We used the simulation environment to generate some short sequences of pedestrian videos together with a questionnaire where the sequences of videos are presented. In the sequence, participants’ responses were analyzed. This section was organized into three parts: Section 4.1 presents some information about the videos from the dataset that were used in the experiment, Section 4.2 discuss the results of the perceptions about the geometrical characteristics of pedestrians and Section 4.3 presents the results of the perceptions about personalities and emotions.

4.1. Video characteristics

We generate video sequences with data extracted from the Cultural Crowds dataset. Table 2 shows the relations of all videos from the dataset that were used in the experiment, with information about the country where the video was recorded, the number of pedestrians and the density level (low, medium, or high). The data of each chosen video was input to a simulated environment containing virtual agents, represented by cylinder or humanoid type avatars, that can be seen, respectively, in Figure 2(b) and (c). We also used the three point of view cameras (top view illustrated in Figure 2(a)). Regarding the participants, an amount of 73 people volunteered for the experiment: 45 males (61.6%) and 28 (38.4%) females and 47.9% have some undergraduate degree. In the next section we discuss the results obtained in the geometrical features perception analysis.

4.2. Geometric features perception

In this section, we present an analysis of subjects perception regarding density, velocity, direction variation of pedestrians and distance among them using three camera’s points of view (first-person, oblique and top-view) and two types of avatars (cylinder and humanoid). The first part of applied questionnaire contains six questions but in all of them we asked for the same aspect: ”In which video do you perceive the higher density?”. Before each question, two or three short videos described in Table 2 were presented. Figure 3 shows the questions and percentage of answers.

The first question (D1) aimed to evaluate if the participants can perceive the density variation once we did not include any explanation about that. Therefore, scenes of videos with low, medium and high density of people in crowds were presented where we want to to check if the subjects could correctly select the high density one. 89% of participants responded according to ground truth, i.e. they could correctly classify the high density video. The other 11% answered ”I do not know”, ”I did not notice density difference” and low and medium density options. In D2 and D3 we presented videos with same density but displayed with the different points of view, however in D2 we used humanoids and in D3 we used cylinders. We asked to the subjects to select the video where the higher density was observed. Our goal was to check if the subjects could perceive the same density or if density perception changes due to the camera point of view or the way the agents are displayed. In question D2, 70% chose one of the videos, while 29% of the participants marked the option ”I did not notice density difference”, so for this small group it seems that the camera does not change the perception. Details are presented in Figure 3, and results indicate that the camera point of view can disturb the density perception. Regarding the point of view, oblique cameras present the higher percentage of answers. In question D3, 69% chosen one of the videos, while 31% of the participants marked the option ”I did not notice density difference”, indicating that the visualization with cylinders or humanoids also change the final result. In question D4, we showed two videos with same density and same point of view, however changing the type of avatar. 25% of people selected the option ”I did not notice density difference”, while 72% chosen one of the avatar types, being 41% of the participants have chosen humanoids. In questions D5 and D6, we included, in same videos analyzed before, walls that surround the agents (see Figure 2(c)). The goal is to check if it changes the density perception using first-person camera. In this case 66% of subjects answered that one of the videos presented higher density in comparison to a same density video without walls. Regarding speed perception, the questionnaire also contains six questions, all of them are related to low-density videos described in Table 2. The goal of these questions is to evaluate the speed levels running and walking, as presented in Equation 1, through the top and oblique cameras, in addition to the two types of avatars: cylinder and humanoid.

In such videos there was no analysis of perceptions using the camera in the first person, since we observe that such videos did not allow a good vision of the scene. As in density analysis, we asked the same question ”In which video did you observe the higher velocity” and showed variations of parameters we want to measure. Question S1 presented two videos with velocity=running and cameras=oblique and top. As shown in Figure 4, 32% of subjects do not perceive any difference in velocity while 64% chosen one of the videos. Same process for question S2 but using velocity=walking and 26% does not perceive difference while 74% chosen one of the videos. Questions S3 and S4 presented same velocity respectively in oblique and top of view camera. For S3, 14% of subjects do not perceive velocity changes while 85% selected only one of the cameras. In S4, 28% of them do not perceive velocity changes while 71% selected only one of the cameras. Finally questions S5 and S6 presented two videos containing the two different avatars with oblique and top camera respectively for velocity=walking. Results were very similar having 17% and 19% respectively of people who do not perceive difference against 82% and 81% of people that chose one of the videos. So, our results indicate that the camera point of view and type of avatar impacts in the velocity perception. Regarding the perception of angular variation, the questionnaire contains two questions with comparisons between the three types of cameras and two types of avatars. All angular variation questions used scenes from BR-34 (high density) video shown in Table 2. Again, we asked the same question ”In which video do you observe more angular variation performed by the agents?” and videos variate the measured parameters. Question A1 presented three videos with humanoids viewed with 3 different camera positions. As shown in Figure 5, only 14% of subjects do not perceive difference in the angular variation while 83% chosen one of the videos and the top view camera was more selected. Similar process for question A2 where avatars were cylinders. 18% of subjects did not perceive difference while 79% selected one of the videos. Most part of people who selected one video chosen the one with humanoids.

Regarding the perception of distance between the avatars, the questionnaire contains two questions, all with videos containing high density. The videos used in these questions were the same as the questions about the perception of angular variation, i.e the types of cameras and the two types of avatars, and the question is: ”In which video do you observe the largest distance among agents?”. Indeed, results were very similar in both question E1 and question E2. As shown in Figure 6, in E1 we displayed humanoids with the three cameras and 22% of subjects do not perceive differences, while in E2 we displayed cylinders and 24% also do not perceive changes. On the other hand, 77% and 73% of subjects, respectively, selected one of the videos in a approximately uniformly distributed way.

So, in this section we analyzed the subjects perception related to density, speed, angular variation and distances among agents displayed using two types of avatars and in three different cameras point of view. Results indicate that changing the way we displayed avatars and cameras position the subjects perception also changes. In particular, top of view and oblique cameras seem to provide better information to detect the parameters while humanoids were preferred to indicate the higher values of all evaluated parameters.

4.3. Personality and emotion perceptions

In this section we present the part of this study focused on perception of personality and emotion traits in crowd videos. As explained before, we used the simulation environment to generate some short sequences of pedestrian views in low density crowds (due to the data present in the dataset). In each video sequence we highlighted two individuals with different colors (red and yellow) and we asked to the subjects about them. Table 3 shows the questions with the possible answers, where the correct answer of each question is highlighted in bold. We use as ground truth the results obtained by the approach proposed by Favaretto et al. (Favaretto et al., 2018).

Figure 7 shows the initial and final frames from the video $P01$ , where it is possible to see a group of pedestrians in the right part of the video. Pedestrian highlighted in yellow is part of this group and the pedestrian highlighted in red walk trough the group with a higher speed. In the questions $Q1$ and $Q2$ (related to the video $P01$ ) we asked about which pedestrian (yellow or red) was, respectively, neurotic and angry. Figure 8 shows the answers given by the participants. It was interesting to see that a little bit more than half of participants (57% in $Q1$ and 59% in Q2) answered according to the ground truth. The pedestrian highlighted in red was the most neurotic and angry, according to Favaretto’s approach. Only a few participants answered that the pedestrian highlighted in yellow was neurotic and scared (12% in question $Q1$ and 9% in question $Q2$ ) and 18% answered that “neither of them” was neurotic. As proposed by (Favaretto et al., 2018), geometrically, a neurotic person remains isolated and few collective. So, subjects who do think that no agent was neurotic was certainly thinking about the psychological point of view, while we are analyzing based on space relationship. In video $P01$ , the pedestrian highlighted in red has these characteristics. The pedestrian highlighted in red is angry: isolated, low angular variation, low speed, low socialization and low collectivity.

Following the analysis, video $P02$ (illustrated in Figure 9) has a pedestrian highlighted in yellow interacting with a group of individuals and a pedestrian highlighted in red who is alone and not interacting with anyone. Questions $Q3$ and $Q4$ , who were related to this video, asked participants about which highlighted pedestrian was, respectively, openness to experiences and afraid. Figure 10 shows the answers for that questions. The results plotted in Figure 10 shows that most of the participants perceived the same personality (in case of question $Q3$ ) an the same emotion (question $Q4$ ) when compared to ground truth, i.e. 60% of the participants correctly chose the pedestrian in yellow as the most opened to experiences in question $Q3$ and 59% correctly chose the pedestrian in red as having fear. In the model of (Favaretto et al., 2018), a pedestrian opened to new experiences is related to a high value for the angular variation feature. Geometrically speaking, according to what has been proposed in our model, a person who allows himself/herself to change objectives (direction) while walking is more subject to new experiences. Fear, in turn, is linked to the fact that the person is isolated from others and walks at lower speeds.

Finally, related to the video $P03$ we propose questions Q5, Q6 and Q7, asking, respectively, about happiness, extraversion and sociability. The video $P03$ (illustrated in Figure 11) contains a pedestrian highlighted in yellow walking with a group of people and a pedestrian highlighted in red walking alone, in the opposite direction of all other pedestrians. Regarding question $Q5$ (plotted on the left side of Figure 12), 40% of participants answered according to the ground truth. Geometrically, a happy person is not isolated and can present high levels of collectivity and socialization. Pedestrian highlighted in yellow presented that characteristics and was correctly identified by the participants in the survey.

Questions $Q6$ and $Q7$ analyze, respectively, extraversion and sociability. In question $Q6$ , although most of the participants (33% of them) correctly answer that the pedestrian highlighted in yellow is the most extrovert, it seems that the participants were not very sure about perceiving this characteristic. 25% of them answered that none of the pedestrians were extroverted, 19% replied that the most extroverted pedestrian was the one highlighted in red, 14% did not know and 9% believed that both pedestrians were extroverted. We believe that question $Q6$ caused a greater variety of perceptions from part of the participants due to the fact that we did not explain any concept when asking the questions, nor mentioned that the perceptions would be given from the geometric point of view, considering the position of the pedestrians in the space. Many of the participants, when questioned about extroversion, may have been influenced by the movements and appearances of the humanoids rather than the geometric features. In this sense, in question $Q7$ , instead of which pedestrian was more extroverted, we asked which of the pedestrians appeared to be more sociable. When asked which pedestrian appeared to be more sociable, in question $Q7$ , most of the participants (57% of them) seemed to be more convinced that the pedestrian highlighted in yellow is the most sociable, in accordance with the model proposed by (Favaretto et al., 2018).

5. Final Considerations

This work evaluated people’s perceptions with respect to geometric features, such as: density, speed, angular variation and distances among pedestrians. We also evaluated subjects perception regarding other subtle parameters as personalities and emotions traits in crowds. We proposed and implemented a survey that has been answered by 73 participants through a questionnaire that featured visualizations of scenes taken from videos of the Cultural Crowds (Favaretto et al., 2016b) dataset and propose questions regarding variation of visualization parameters.

Regarding the results of the people’s perceptions about the geometric data, in the general analysis of the cameras, it was noticed that the way agents are displayed and the camera point of view interfere in the parameters perception. In particular, the greater the distance from the camera to the environment (oblique and top cameras), the better seem to be the perception of density, speed and angular variation. With respect to density, we can see that there was a more accurate perception in the first person view when the environment contained walls around the agents. Concerning speed parameter, subjects perceive better the speed variation of the avatars running through the oblique camera than in the top camera. In general analysis of the avatars type, there was a more accurate perception of density when visualized as humanoids in the first-person view, a better perception of angular variation through the humanoids in all the cameras, and more accurate perception of distances when avatars were displayed as cylinders in the top and oblique cameras. We also performed an experiment to evaluate if people can perceive different personalities and emotions performed by pedestrians in crowds. It was interesting to see that, even without explaining to the participants the concepts of each personality or emotion and how they were calculated in our approach (considering the geometric characteristics), in all the cases, more than half of the participants perceived the personality and emotion that the agent was expressing in the video, in accordance with our approach. Of course, this last aspect is much more intangible and the missing explanations that we were interested about spatial manifestation and not trying to ”figure out” if the person is social or open in a psychological point of view is certainly one aspect we want to deal in a future work.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Alameda-Pineda et al . (2018) X. Alameda-Pineda, E. Ricci, and N. Sebe. 2018. Multimodal Behavior Analysis in the Wild: Advances and Challenges . Elsevier Science, London, UK.
3Alexander (1992) R Mc N Alexander. 1992. A model of bipedal locomotion on compliant legs. Phil. Trans. R. Soc. Lond. B 338, 1284 (1992), 189–198.
4Ardeshir and Borji (2018) Shervin Ardeshir and Ali Borji. 2018. Egocentric Meets Top-view. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018).
5C. S. Hall and Campbell (1998) G. Lindzy C. S. Hall and J. B. Campbell. 1998. Theories Of Personality (fourth ed.). John Wiley & Sons, New Jersey.
6Chattaraj et al . (2009) Ujjal Chattaraj, Armin Seyfried, and Partha Chakroborty. 2009. Comparison of pedestrian fundamental diagram across cultures. Advances in complex systems 12, 03 (2009), 393–405.
7Costa and Mc Crae (1992) P.T. Costa and R.R. Mc Crae. 1992. Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor Inventory (NEO-FFI) . PAR. https://books.google.co.in/books?id=mp 3z Nw AACAAJ
8Digman (1990) J. M. Digman. 1990. Personality Structure: Emergence of the Five-Factor Model. Annual Review of Psychology 41 (1990), 417–440.