TL;DR
This paper introduces a GAN-based data-driven crowd simulation method that learns from observed pedestrian trajectories to generate realistic, real-time crowd behaviors with statistical fidelity and interactive capabilities.
Contribution
It presents a novel GAN-based approach for learning and generating pedestrian trajectories that mimic real-world crowd patterns in real time.
Findings
Simulated trajectories preserve statistical properties of real data
The system enables real-time crowd simulation with user interaction
Allows insertion of extra agents and integration with other methods
Abstract
This paper presents a novel data-driven crowd simulation method that can mimic the observed traffic of pedestrians in a given environment. Given a set of observed trajectories, we use a recent form of neural networks, Generative Adversarial Networks (GANs), to learn the properties of this set and generate new trajectories with similar properties. We define a way for simulated pedestrians (agents) to follow such a trajectory while handling local collision avoidance. As such, the system can generate a crowd that behaves similarly to observations, while still enabling real-time interactions between agents. Via experiments with real-world data, we show that our simulated trajectories preserve the statistical properties of their input. Our method simulates crowds in real time that resemble existing crowds, while also allowing insertion of extra agents, combination with other simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Data-Driven Crowd Simulation
with Generative Adversarial Networks
Javad Amirian
Univ Rennes, Inria, CNRS, IRISARennesFrance
,
Wouter van Toll
Univ Rennes, Inria, CNRS, IRISARennesFrance
,
Jean-Bernard Hayet
Centro de Investigación en MatemáticasGuanajuatoMexico
and
Julien Pettré
Univ Rennes, Inria, CNRS, IRISARennesFrance
(2019)
Abstract.
This paper presents a novel data-driven crowd simulation method that can mimic the observed traffic of pedestrians in a given environment. Given a set of observed trajectories, we use a recent form of neural networks, Generative Adversarial Networks (GANs), to learn the properties of this set and generate new trajectories with similar properties. We define a way for simulated pedestrians (agents) to follow such a trajectory while handling local collision avoidance. As such, the system can generate a crowd that behaves similarly to observations, while still enabling real-time interactions between agents. Via experiments with real-world data, we show that our simulated trajectories preserve the statistical properties of their input. Our method simulates crowds in real time that resemble existing crowds, while also allowing insertion of extra agents, combination with other simulation methods, and user interaction.
crowd simulation, content generation, machine learning, intelligent agents, generative adversarial networks
††journalyear: 2019††copyright: licensedothergov††conference: Computer Animation and Social Agents; July 1–3, 2019; PARIS, France††booktitle: Computer Animation and Social Agents (CASA ’19), July 1–3, 2019, PARIS, France††price: 15.00††doi: 10.1145/3328756.3328769††isbn: 978-1-4503-7159-9/19/07††ccs: Computing methodologies Intelligent agents††ccs: Computing methodologies Neural networks††ccs: Computing methodologies Motion path planning††ccs: Computing methodologies Real-time simulation
1. Introduction
The realistic simulation of human crowd motion is a vast research topic that includes aspects of artificial intelligence, computer animation, motion planning, psychology, and more. Generally, the goal of a crowd simulation algorithm is to populate a virtual scene with a crowd that exhibits visually convincing behavior. The simulation should run in real time to be usable for interactive applications such as games, training software, and virtual-reality experiences. Many simulations are agent-based: they model each pedestrian as a separate intelligent agent with individual properties and goals.
To simulate complex behavior, data-driven crowd simulation methods use real-world input data (such as camera footage) to generate matching crowd motion. Usually, these methods cannot easily generate new behavior that is not literally part of the input. Also, they are often difficult to use for applications in which agents need to adjust their motion in real-time, e.g. because the user is part of the crowd.
In this paper, we present a new data-driven crowd simulation method that largely avoids these limitations. Our system enables the real-time simulation of agents that behave similarly to observations, while allowing them to deviate from their trajectories when needed. More specifically, our method:
- (1)
learns the overall properties of input trajectories, and can generate new trajectories with similar properties; 2. (2)
embeds these trajectories in a crowd simulation, in which agents follow a trajectory while allowing for local interactions.
For item 1, we use Generative Adversarial Networks (GANs) (Goodfellow2014-GAN, ), a novel technique in machine learning for generating new content based on existing data. For item 2, we extend the concept of ‘route following’ (Jaklin2013-MIRAN, ) to trajectories with a temporal aspect, prescribing a speed that may change over time.
Using a real-world dataset as an example, we will show that our method generates new trajectories with matching styles. Our system can (for example) reproduce an existing scenario with additional agents, and it can easily be combined with other crowd simulation methods.
2. Related Work
Agent-based crowd simulation algorithms model pedestrians as individual intelligent agents. In this paradigm, many researchers focus on the local interactions between pedestrians (e.g. collision avoidance) using microscopic algorithms (Helbing1995-SocialForces, ; vandenBerg2011-ORCA, ). In environments with obstacles, these need to be combined with global path planning into an overall framework (vanToll2015-Framework, ). A growing research topic lies in measuring the ‘realism’ of a simulation, by measuring the similarity between two fragments of (real or simulated) crowd motion (Wang2016-PathPatterns, ).
Complex real-life behavior can hardly be described with simple local rules. This motivates data-driven simulation methods, which base the crowd motion directly on real-world trajectories, typically obtained from video footage. One category of such methods stores the input trajectories in a database, and then pastes the best-matching fragments into the simulation at run-time (Lerner2007-CrowdsByExample, ; Lee2007-GroupBehavior, ). Another technique is to create pre-computed patches with periodically repeating crowd motion, which can be copied and pasted throughout an environment (Yersin2009-CrowdPatches, ). Such simulations are computationally cheap, but difficult to adapt to interactive situations.
Researchers have also used input trajectories to train the parameters of (microscopic) simulation models (Wolinski2014-ParameterEstimation, ), so as to adapt the agents’ local behavior parameters to match the input data. However, this cannot capture any complex (social) rules that are not part of the used simulation model.
To replicate how agents move through an environment at a higher level, some algorithms subdivide the environment into cells and learn how pedestrians move between them (Pellegrini2012-DestinationFlow, ; Zhong2016-BehaviorLearning, ). Our goal is similar (reproducing pedestrian motion at the full trajectory level), but our approach is different: we learn the spatial and temporal properties of complete trajectories, generate new trajectories with similar properties, and let agents flexibly follow these trajectories.
Our work uses Generative Adversarial Networks (GANs) (Goodfellow2014-GAN, ), a recent AI development for generating new data. GANs have been successful at generating creative content such as faces (Di2018-FaceSynthesis, ). Recently, researchers have started to adopt GANs for short-term prediction of pedestrian motion (Amirian2019-SocialWays, ). To our knowledge, our work is the first to apply GANs in crowd simulation at the full trajectory level.
3. Generating Trajectories
In this section, we describe our GAN-based method for generating trajectories that are similar to the examples in our training data.
As in most crowd-simulation research, we assume a planar environment and we model agents as disks. We define a trajectory as a mapping : that describes how an agent moves through an environment during a time period of seconds. Note that a trajectory encodes speed information: our system should capture when agents speed up, slow down, or stand still.
In practice, we will represent a trajectory by a sequence of points separated by a fixed time interval ; that is, each has a corresponding timestamp . In our experiments, we use s because our input data uses this value as well. We will use the notation to denote a sub-trajectory from to .
Given a dataset of trajectories , our generator should learn to produce new trajectories with properties similar to those in . We assume that all trajectories start and end on the boundary of a region of interest , which can have any shape and can be different for each environment.
Overview of GANs. A Generative Adversarial Network (Goodfellow2014-GAN, ) consists of two components: a generator that creates new samples and a discriminator that judges whether a sample is real or generated. The training phase of a GAN is a two-player game in which learns to ‘fool’ , until (ideally) the generated samples are so convincing that does not outperform blind guessing.
Internally, both and are artificial neural networks; let and be their respective weights. acts as a function that converts an -dimensional noise vector to a fake sample . acts as a function that converts a (real or fake) sample to a value indicating the probability of being real. Training a GAN represents the following optimization problem:
[TABLE]
where is known as the loss function. Its first term denotes the expected output of for a random real sample. This is higher when correctly classifies more input samples as real. Conversely, the second term is higher when classifies more generated samples as fake. Here, and are the probability distributions of (respectively) the real data and the noise vectors sent to .
Overview of Our System. Figure 1 displays an overview of our GAN system. The generator and discriminator both have two tasks: generating or evaluating the entry points of a trajectory (i.e. the first two points and ), and generating or evaluating the continuation of a trajectory (i.e. the next point after a sub-trajectory ). For the continuation tasks, we use concepts from so-called ‘conditional GANs’ because the generator and discriminator take extra data as input. We will now describe the system in more detail. Parameter settings will be mentioned in Section 5.
Generator. To generate entry points, the generator feeds a random vector to a fully connected (FC) block of neurons. Its output is a 4D vector that contains the coordinates of and .
To generate the continuation of a trajectory , the generator feeds and a noise vector to a Long Short Term Memory (LSTM) layer that should encode the relevant trajectory dynamics. LSTMs are common recurrent neural networks used for handling sequential data. The output of this LSTM block is sent to an FC block, which finally produces a 2D vector with the coordinates of . Let denote this result. Ideally, this point will be taken from the (unknown) distribution of likely follow-ups for .
The continuation step is repeated iteratively. If the newly generated point lies outside of the region of interest , then the trajectory is considered to be finished. Otherwise, the process is repeated with inputs and a new noise vector.
Discriminator. The discriminator takes an entire (real or fake) trajectory as input. It splits the discrimination into two tasks with a similar structure as in . For the entry point part, an FC block evaluates to a scalar in , which we denote by . For the continuation part, an LSTM+FC block separately evaluates each point (for ) given the sub-trajectory . We denote the result for the th point by .
So, for a full trajectory of points, the discriminator computes scalars that together indicate the likelihood of being real. The training phase uses these numbers in its loss function.
Training. Each training iteration lets generate a set of trajectories for different (sequences of) noise vectors. We then let classify all trajectories (both real and fake). The loss function of our GAN is the sum of two components:
- •
the success rate for discriminating all entry points:
[TABLE]
- •
the success rate for discriminating all other points:
[TABLE]
To let our GAN train faster, we add a third component. For each real trajectory , we take all valid sub-trajectories of length and let generate its own version of given . We add to our loss function:
[TABLE]
i.e. we sum up the Euclidean distances between real and generated points. We observed that this additional component leads to much faster convergence and better back-propagation.
To reduce the chance of ‘mode collapse’ (i.e. convergence to a limited range of samples), we use an ‘unrolled’ GAN (Metz2017-UnrolledGAN, ). This is an extended GAN where each optimization step for uses an improved version of the discriminator that is steps further ahead (where is a parameter).
4. Crowd Simulation
Recall that our goal is to use our trajectories in a real-time interactive crowd simulation, where agents should be free to deviate from their trajectories if needed. This section describes how we combine our trajectory generator with a crowd simulator.
Our approach fits in the paradigm of multi-level crowd simulation (vanToll2015-Framework, ), in which global planning (i.e. computing trajectories) is detached from the simulation loop. This loop consists of discrete timesteps. In each step, new agents might be added, and each agent tries to follow its trajectory while avoiding collisions.
Adding Agents. To determine when a new agent should be added to the simulation, we use an exponential distribution whose parameter denotes the average time between two insertions. This parameter can be obtained from an input dataset (to produce similar crowdedness), but one may also choose another value deliberately. Each added agent follows its own trajectory produced by our GAN.
Trajectory Following. In each frame of the simulation loop, each agent should try to proceed along its trajectory while avoiding collisions. The main difference with classical ‘route following’ (Jaklin2013-MIRAN, ) is that our trajectories have a temporal component: they prescribe at what speed an agent should move, and this speed may change over time. Therefore, we present a way to let an agent flexibly follow while respecting its spatial and temporal data. Our algorithm computes a preferred velocity that would send the agent farther along . This can then be fed to any existing collision-avoidance algorithm, to compute a velocity that is close to while avoiding collisions with other agents.
Two parameters define how an agent follows : the time window and the maximum speed . An agent always tries to move to a point that lies seconds ahead along , taking into account. During the simulation, let be the time that has passed since the agent’s insertion. Ideally, the agent should have reached by now. Our algorithm consists of the following steps:
- (1)
Compute the attraction point , where and is the end time of . Thus, is the point that lies seconds ahead of , clamped to the end of if needed. 2. (2)
Compute the preferred velocity as , where is the agent’s current position. Thus, is the velocity that will send the agent to , with a speed based on the difference between and . 3. (3)
If , scale so that . This prevents the agent from receiving a very high speed after it has been blocked for a long time.
Collision Avoidance. The preferred velocity computed by our algorithm can be used as input for any collision-avoidance routine. In our implementation, we use the popular ORCA method (vandenBerg2011-ORCA, ). In preliminary tests, other methods such as social forces (Helbing1995-SocialForces, ) proved to be less suitable for our purpose.
5. Experiments and Results
Set-up. We have implemented our GAN using the PyTorch library (https://pytorch.org/). The input noise vectors are 3-dimensional and drawn from a uniformly random distribution. In both and , the entry-point FC blocks consist of 3 layers with 128, 64, and 32 hidden neurons, respectively. For the continuation part, the LSTM blocks consist of 62 cells, and the FC blocks contain 2 layers of 64 and 32 hidden neurons. To save time and memory, the LSTM blocks only consider the last 4 samples of a sub-trajectory.
For training the GAN, all FC layers use Leaky-ReLU activation functions (with negative slope ), to let the gradient always back-propagate, which avoids gradient-vanishing issues. We train the GAN for iterations, using an unrolling parameter .
In the crowd simulation, we model agents as disks with a radius of m, and we use a simulation loop with a fixed frame length of s. In each frame, all agents perform our route-following algorithm (with and m/s), followed by the ORCA algorithm (vandenBerg2011-ORCA, ) as implemented by the original authors. We remove an agent when it reaches the end of its trajectory.
We test our method on the ETH dataset (Pellegrini2009-SocialTracking, ) that contains recorded trajectories around the entrance of a university building. We have defined the region of interest as an axis-aligned bounding box, and we use only the 241 trajectories that both enter and exit .
Result 1: Entry Points. To show the performance of our GAN in learning the distribution of entry points, we computed 500 (fake) entry points in the ETH scene, and we calculated the distribution of the samples over the boundary of . We also compared these results against two other generative methods: a Gaussian Mixture Model (GMM) with 3 components, and a ‘vanilla’ GAN variant that does not use the unrolling mechanism. As shown in Fig. 2, the entry points of the unrolled GAN (right) are closer to the real data than those of the other two methods.
Result 2: Trajectories. Next, we used our system to generate 352 new trajectories, and we used them to simulate a crowd. The first two heatmaps in Fig. 3 show that generated trajectories (middle) are similarly distributed over the environment as the real data (left).
The third heatmap shows the final motion of the simulated agents with route following and collision avoidance. In this scenario, agents are well capable of following their given trajectories.
Computation time. We used CUDA to run our GAN on a NVIDIA Quadro M1200 GPU with 4GB of GDDR5 memory. With this set-up, generating a batch of 1024 trajectories (with a maximum length of 40 points) took ms, meaning that the average generation time was ms per trajectory. Thus, after training, the system is sufficiently fast for real-time insertions of trajectories into a crowd.
6. Conclusions & Future Work
We have presented a data-driven crowd simulation method that uses GANs to learn the properties of input trajectories and then generate new trajectories with similar properties. Combined with flexible route following that takes temporal information into account, the trajectories can be used in a real-time crowd simulation. Our system can be used, for example, to create variants of a scenario with different densities. It can easily be combined with other simulation methods, and it allows interactive applications.
In the future, we will perform a thorough analysis of the trajectories produced by our system, and compare them to other algorithms. We will also investigate the exact requirements for reliable training. Furthermore, our system generates trajectories for individuals, assuming that agents do not influence each other’s choices. As such, it cannot yet model group behavior, and it performs worse in high-density scenarios where agents cannot act independently. We would like to handle these limitations in future work.
Acknowledgements.
This project was partly funded by EU project CROWDBOT (H2020-ICT-2017-779942).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Amirian, J., Hayet, J.-B., and Pettré, J. Social ways: Learning multi-modal distributions of pedestrian trajectories with GA Ns. In CVPR Workshops (2019).
- 2(2) van den Berg, J., Guy, S., Lin, M., and Manocha, D. Reciprocal n-body collision avoidance. In Proc. Int. Symp. Robotics Research (2011), pp. 3–19.
- 3(3) Di, X., and Patel, V. Face synthesis from visual attributes via sketch using conditional VA Es and GA Ns. Co RR abs/1801.00077 (2018).
- 4(4) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Proc. Int. Conf. Neural Information Processing Systems (2014), vol. 2, pp. 2672–2680.
- 5(5) Helbing, D., and Molnár, P. Social force model for pedestrian dynamics. Physical Review E 51 , 5 (1995), 4282–4286.
- 6(6) Jaklin, N., Cook IV, A., and Geraerts, R. Real-time path planning in heterogeneous environments. Computer Animation and Virtual Worlds 24 , 3 (2013), 285–295.
- 7(7) Lee, K., Choi, M., Hong, Q., and Lee, J. Group behavior from video: A data-driven approach to crowd simulation. In Proc. ACM SIGGRAPH/Eurographics Symp. Computer Animation (2007), pp. 109–118.
- 8(8) Lerner, A., Chrysanthou, Y., and Lischinski, D. Crowds by example. Computer Graphics Forum 26 , 3 (2007), 655–664.
