TL;DR
This paper introduces a reinforcement learning method for autonomous vehicle merging in dense traffic that models driver cooperation levels, leading to fewer deadlocks compared to traditional planning methods.
Contribution
It proposes a novel reinforcement learning approach that maintains a belief over driver cooperation levels to improve decision making in dense traffic merging scenarios.
Findings
Reduces deadlocks in dense merging scenarios.
Learns to interact effectively with drivers of varying cooperation levels.
Outperforms online planning methods in dense traffic maneuvers.
Abstract
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks…
| Hyperparameter | Value |
|---|---|
| Neural network architecture | 2 dense layers of 64 and 32 nodes |
| Training steps | |
| Activation functions | Rectified linear units |
| Replay buffer size | experience samples |
| Target network update frequency | episodes |
| Discount factor | |
| Optimizer | Adam |
| Learning rate | |
| Prioritized replay [16] | , |
| Exploration strategy | greedy |
| Exploration fraction | |
| Final |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Cooperation-Aware Reinforcement Learning
for Merging in Dense Traffic
Maxime Bouton,1 Alireza Nakhaei,2 Kikuo Fujimura,2 and Mykel J. Kochenderfer1 *This work was supported by the Honda Research Institute.1 Maxime Bouton and Mykel J. Kochenderfer are with the Department of Aeronautics and Astronautics, Stanford University, Stanford CA 94305, USA, {boutonm,mykel}@stanford.edu.2 Alireza Nakhaei and Kikuo Fujimura are with the Honda Research Institute, 375 Ravendale Dr., Mountain View, CA 94043, USA, anakhaei,[email protected].
Abstract
Decision making in dense traffic can be challenging for autonomous vehicles. An autonomous system only relying on predefined road priorities and considering other drivers as moving objects will cause the vehicle to freeze and fail the maneuver. Human drivers leverage the cooperation of other drivers to avoid such deadlock situations and convince others to change their behavior. Decision making algorithms must reason about the interaction with other drivers and anticipate a broad range of driver behaviors. In this work, we present a reinforcement learning approach to learn how to interact with drivers with different cooperation levels. We enhanced the performance of traditional reinforcement learning algorithms by maintaining a belief over the level of cooperation of other drivers. We show that our agent successfully learns how to navigate a dense merging scenario with less deadlocks than with online planning methods.
I Introduction
Merging into very dense traffic situation is a challenging task for autonomous vehicles. Some decision making algorithms are overly conservative and sometimes fail to create a gap in the traffic. Performing a merge maneuver requires reasoning about the reaction of traffic participants to the merging vehicle in a short time. When no gap is present in the traffic, the autonomous vehicle must be proactive in distinguishing drivers that are willing to slow down and yield to the merging vehicle.
It has been shown that planning algorithms must consider interaction and joint collision avoidance models to avoid deadlock situations also known as the freezing robot problem [1]. Previous research addressed the issue of interaction-aware planning by combining a probabilistic interaction model with an online planner [2, 3, 4, 5]. Online planners simulate the environment up to a certain horizon and take actions that maximize the expected reward of the corresponding simulated trajectories. These approaches can scale to large environments and continuous state spaces, but they still suffer from the curse of dimensionality. The computational complexity associated with dense traffic scenarios limits the planning to short time horizons [3, 6] or limit the number of vehicles considered [4, 5].
The performance of the resulting policies is greatly influenced by the underlying model used to represent the environment [7]. [7] showed that a significant improvement in performance can be gained when the planning algorithm has access to information about the driver internal state in lane changing scenarios. Previous work address the problem of modeling interactions between traffic participants using data-driven approaches, probabilistic models, inverse reinforcement learning, rule-based methods, or game theoretic frameworks [8, 9, 4, 2, 5]. Inverse reinforcement learning techniques and game theoretic frameworks are generally too computationally expensive to be used in an online planning algorithm considering more than two traffic participants [9, 5]. [4] demonstrated a data-driven approach to learn the interaction model on a traffic weaving scenario involving two agents [4]. They leveraged parallelization to use this model efficiently for online planning. Such an approach is promising but is not suitable for dense traffic scenarios where more than two traffic participants are interacting.
Instead of relying on online planning methods, we propose to learn an efficient navigation strategy offline, using reinforcement learning (RL). RL provides a flexibility in the choice of the interaction model. RL has been applied to a variety of driving scenarios such as lane changing [10], or intersection navigation [11, 12].
In this work, we analyze the ability of an RL agent to benefit from interaction between traffic participants in dense merging scenarios. We show that deep reinforcement learning policies can capture interaction patterns when trained in a variety of different scenarios, even if information about the driver behavior is not available. This approach is then combined with a belief updater that explicitly maintains a probability distribution over the driver cooperation levels. Simulation results shows that an RL agent using belief states as input yields better performance than standard RL techniques as well as an online planning solver. In addition, we propose a simple rule-based behavior parameterized by a cooperation level to model the reaction of vehicles on the main lane to the merging vehicle.
The scenario of interest is illustrated in fig. 1. Vehicles on the main lane have priority over the merging car (in blue). We focus on dense traffic situation where cars drive slowly (around ) and very close to each other (the gaps can be below ). The ego vehicle on the merging ramp must merge into traffic at a given merge point. When the gaps between vehicles is not sufficient, the merging vehicle can only merge if vehicles on the main lane agree to yield.
II Background
In this section we introduce background material on partially observable Markov decision processes and reinforcement learning.
II-A Partially Observable Markov decision processes
A partially observable Markov decision process (POMDP) is defined by the tuple where is the state space, the action space, the observation space, the transition model, the reward function, the observation model, and the discount factor. An agent taking action in state transitions to a next state with probability and receives a reward . In a POMDP, the agent does not know the state, instead, it maintains an internal knowledge of the state through a belief such that represents the probability of being in state . The belief is updated at each time step after receiving an observation . The observation is related to the state through the observation model as follows: .
In a POMDP, actions are chosen according to a policy mapping beliefs to action. Policies are associated to value functions representing the expected accumulated reward when taking action in belief state and then following policy . The agent seeks to maximize the expected accumulated discounted reward. Searching for the optimal value function is often intractable but many algorithms can compute reasonable approximation of the optimal value function [13].
II-B Reinforcement Learning
In this work, we consider reinforcement learning (RL) as an approximate planning technique to solve POMDPs. RL restricts the search of policies to functions mapping observations to actions instead of mapping beliefs to actions. So-called memoryless policies are often a competitive alternative to belief state policies for solving POMDPs [14]. Standard RL makes the underlying assumption that the POMDP is an MDP of state space . In this MDP, the optimal value function satisfies the Bellman equation:
[TABLE]
In problems with continuous observations, the value function must be approximated. In this work we use a neural network to represent the value function. Such procedure is referred to as deep Q-learning (DQN) [15]. The solution to eq. 1 can be approximated by the network minimizing the following loss function:
[TABLE]
where represents the parameters of the network. Given an experience sample , the weights are updated as follows:
[TABLE]
where is the learning rate, a hyperparameter of the algorithm. In practice, experience samples are stored in a replay buffer after each interaction. The loss function is optimized over mini-batches sampled from the buffer regularly during the training. Mini-batches are sampled using experience replay [16].
III Proposed Approach
This section describes the modeling of the merging scenario as a POMDP. The proposed model is then used as a simulator for training a reinforcement learning agent using Deep Q learning.
III-A Merging Scenario POMDP
We model the merging scenario as a POMDP with the following definitions of the states, observations, actions, reward and transition model.
III-A1 States
The state of a vehicle corresponds to its physical state as well as its internal state characterizing its behavior. The physical state of a vehicle corresponds to its distance to the merge point, its longitudinal velocity, and its acceleration. In this work, the behavior is characterized by a single parameter , the cooperation level, detailed in section III-B. We let be the state of vehicle at time . The state of the ego vehicle (the controlled agent), , does not contain the behavior parameter. The complete state of the environment consists of the collection of the individual states of each vehicle present: where is the number of vehicles present at time . Our formulation does not restrain the number of vehicle considered in the problem.
III-A2 Observations
The ego vehicle has limited sensing capabilities. It can only sense the vehicles within a certain range, and cannot measure internal states of other vehicles. In this work we do not consider sensor noise. Instead, we focus on the partial observability introduced by the internal state governing the behavior of other drivers. To simplify the observation space, we restrict the observation to the longitudinal position and velocity of four neighbor cars: the front neighbor of the ego vehicle, the vehicle right behind the merge point, the rear neighbor and front neighbor of the projection of the ego vehicle on the main lane. To project the ego vehicle on the main lane, we place it at the same distance to the merge point than its actual position on the merge lane. The ego vehicle can observe the longitudinal position and velocity of these four vehicles perfectly if they are in the field of view. The longitudinal position, speed, and acceleration of the ego car are observed. In addition, we compare the cases where the internal state is directly observable by the ego vehicle or not leading to two or three dimensions per neighbor cars. The total dimension of the observation space is when the internal states are observed and otherwise. We will refer to those two cases as RL with full observation (FO), that can observe the internal state, and standard RL that only observes positions and velocities. Figure 2 illustrates the vehicles observed by the ego car. In addition, it can observe its own state (three dimensions). This observation vector is used as input to our RL agent.
III-A3 Actions
The ego vehicle controls its motion by applying a change in acceleration. At each time step, the acceleration is updated as follows:
[TABLE]
where is the action chosen by the agent in the set -1\text{,}\mathrm{m}\text{/}{\mathrm{s}}^{2}-0.5\text{,}\mathrm{m}\text{/}{\mathrm{s}}^{2}0\text{,}\mathrm{m}\text{/}{\mathrm{s}}^{2}0.5\text{,}\mathrm{m}\text{/}{\mathrm{s}}^{2}1\text{,}\mathrm{m}\text{/}{\mathrm{s}}^{2}. The agent can also apply a hard braking action and releasing action which instantaneously set the acceleration to and respectively. Hence, the action space is discrete with possible actions at each time step. The design of the action space is inspired by the work of [17] [17].
III-A4 Reward
The reward function is designed such that the optimal policy maximizes safety and efficiency. The agent receives a penalty of for colliding with other traffic participants and receives a bonuses of for reaching a goal positions defined after the merge point. The time minimization is incentivized by the discount factor, the sooner the goal is reached, the less the bonus is discounted. We used a discount factor of .
III-A5 Transition
Each vehicle follows a one dimensional point mass dynamics:
[TABLE]
where is the position of the vehicle at time , its velocity and its acceleration. Other drivers in the environment follows an interactive driver model described in the next section. In addition, when a vehicle passes the end of the main lane it is spawned at the beginning of the lane again. This technique allows to maintain a dense traffic in simulation.
III-B Driver Model
To model the behavior of drivers on the main lane, we propose an extension of the Intelligent Driver Model (IDM) [18]. Our model controls the longitudinal acceleration of the vehicle on the main lane while taking into account merging vehicles. In addition to the IDM parameters, we introduce a cooperation level which is a scalar parameter controlling the reaction to the merging vehicle state. represents a driver who slows down to yield to the merging vehicle if she predicts that the merging vehicle will arrive ahead of time. represents a driver who completely ignores the merging vehicle until it traverses the merge point and follows standard IDM. We will refer to this model as Cooperative Intelligent Driver Model (C-IDM).
C-IDM relies on estimating the time to reach the merge point () for the car on the main lane () and the car on the merge lane () to decide whether the merging vehicles should be considered or not. Once the time to merge for both vehicles is estimated, three cases are considered:
- •
If , the vehicle on the main lane follows IDM by considering the projection of the merging vehicle on the main lane as its front car.
- •
If , the vehicle on the main lane follows standard IDM.
- •
In the absence of a merging vehicle or far from the merging vehicle, the C-IDM driver follows the standard IDM.
Although this model might not represent precisely how human drivers behave, it provides us with a broad range of behaviors by adjusting the cooperation level. In this work we used a simple constant velocity model to estimate the time to merge. A more sophisticated prediction model can be used to have more realistic estimates of the . Given the cooperation level, the driver has a deterministic behavior. Figure 3 illustrates the situation where a cooperative vehicle takes into account the merging vehicle.
III-C Inferring the Cooperation Level
Since the cooperation level cannot be directly measured, the ego car maintains a belief on the cooperation level of the observed drivers. At each time step, the belief is updated using a recursive Bayesian filter given the current observation of the environment.
In this problem, the position and velocity of other drivers is assumed to be fully observable whereas the cooperation level is not always observed. This assumption of mixed observability can help us reduce the computational cost of the belief update. The agent only maintains a distribution over the cooperation level of observed drivers instead of estimating the full state of the environment. Our simple belief updater, is acting as if the cooperation level was binary although it can take a continuous value. The belief at time is composed of the fully observable part of the state, , and for , where is the number of observed drivers. represents the probability that vehicle has a cooperation level of . At time , the ego vehicle observes and updates its belief on the cooperation level of vehicle as follows:
[TABLE]
Equation 7 consists of simulating forward the previous state with the two possible hypothesis: and and comparing the outcome with the current observation. The probability of transitioning from to given the cooperation level, is computed by propagating the state forward using the proposed transition model and assuming a Gaussian distribution centered around the predicted value with a standard deviation of for the positions and for velocities. Without this addition of noise in the transition model, the belief would converge after one observation to or since the two models would be perfectly distinguishable.
The focus of this work is not to develop an efficient driver state predictor but rather discuss how this information can be used by an RL agent. Future work might consider more complex filtering techniques such as multi-hypothesis filters, interacting multiple models, or data-driven approaches to estimate the driver cooperation level from observation [19]. An additional extension is to consider continuous values of the cooperation level since it is supported by the C-IDM model.
III-D Belief State Reinforcement Learning
Standard reinforcement learning techniques assume that the underlying environment is an MDP. Latent states such as driver behavior characteristics are not explicitly inferred during training. Although memoryless policies can be efficient, reasoning about latent states can often lead to a significant improvement [7]. In the merging scenario, knowing which drivers are more cooperative can help the agent take better decisions. It is expected that the ego vehicle will only try to merge in front of cooperative drivers.
In order to learn such a behavior through reinforcement learning, we propose to use the belief state as input to the reinforcement learning policy. The resulting algorithm is very similar to the standard DQN algorithm applied to the belief MDP. A transition in the belief state MDP can be described as follows:
- •
At time step , the agent has a belief and receives an observation
- •
The new belief is computed using eq. 7
- •
The agent takes action
The rest of the algorithm is identical to standard DQN.
The input to the network is a vector of dimension :
[TABLE]
where is the number of observed vehicle, and is the fully observable part of the state (information on position and velocity), and is the probability of driver being cooperative. By feeding in the probability on the internal state, the agent can reason in the belief space rather than in the observation space. The resulting Q network is combined with a belief updater at test time to result in a policy robust to partial observability.
IV Implementation
This section highlights some important aspects of our implementation: the RL training procedure and the distribution of scenarios used during training. Further details can be found in our code base111https://github.com/sisl/AutonomousMerging.jl
IV-A Reinforcement Learning Training Procedure
We used a curriculum learning approach to train the agent by gradually increasing the traffic density. When training an RL agent in dense traffic directly, the policy converges to a suboptimal solution which is to stay still in the merge lane and does not leverage the cooperativeness of other drivers. Such a policy avoids collisions but fails at achieving the maneuver. To encourage exploratory actions, we first train an agent in an environment with sparser traffic ( to cars). An alternative to curriculum learning is to design the reward function to incentivize the learning agent to move forward at each time step. However, a more complex reward function often requires a lot of parameter tuning . We found the curriculum learning approach more practical in this work.
The parameters of the DQN algorithm are summarized in table I. It was implemented using the Flux.jl library [20]. Training one policy took around minutes on three million examples.
IV-A1 Initial State Distribution
To populate the initial state of the merging scenarios, we used a two step procedure. The first step consists of sampling the number of vehicles present from a desired range. We considered two ranges corresponding to different traffic conditions:
- •
Mixed traffic: between and cars on the main lane. The agent will experience both sparse traffic scenarios and dense traffic scenarios.
- •
Dense traffic: between and cars on the main lane.
The main lane has a length of and the vehicles have a length of . In dense traffic situations, the gap between vehicles varies from less than to larger distances. Once the number of cars is decided, vehicles are randomly positioned on the main lane. The initial velocity of each vehicle is drawn from a Gaussian distribution of mean and a standard deviation of Finally, the desired velocity of each vehicle is drawn uniformly from the set: 4\text{,}\mathrm{m}\text{/}\mathrm{s}5\text{,}\mathrm{m}\text{/}\mathrm{s}6\text{,}\mathrm{m}\text{/}\mathrm{s} and their cooperation level is drawn uniformly in the interval . The desired velocity is used to parameterize the IDM part of C-IDM.
The second step of the initialization consists of simulating the initial state for a burn-in time randomly chosen between and . During this burn-in time, the ego vehicle is not present and the vehicles on the main lane follow the C-IDM. The burn-in time allows the initial state to converge to a more realistic situation. This approach is loosely inspired by the work of [21] [21].
Such procedure ensures that the learning agent experiences a variety of situations during training. The design of the training scenarios is critical to the performance of the RL agent and will determine the ability of the policy to generalize. The more variety it experiences during training, the more the policy can generalize.
V Experiments
To evaluate the ability of our method to use the cooperativeness of other drivers, we compared the performance of different policies in the merging scenario with dense traffic (between and cars on the main lane). Each policy is evaluated in scenarios with initial states sampled as described in section IV-A1. We measured the percentage of scenarios that ended in collisions (collision rate), the average time to reach the goal position located after the merge point, as well as the number of scenarios ending in time-out failure. A time-out failure is declared if the ego vehicle has not reached the goal position after . Such failure cases are representative of the robot freezing problem [1].
We compare the following methods:
V-1 RL without cooperation level information
The first policy consists of using a standard reinforcement learning algorithm which can observe position and velocity of the other vehicle but not their cooperation level. It is referred to as RL.
V-2 RL with cooperation level information
This policy is trained using standard reinforcement learning but has access to the cooperation level at training and test time. It is referred to as RL (FO).
V-3 belief state RL
This policy is trained in the belief MDP. The input of the policy is the prediction given by the belief updater. It is referred to as belief RL.
V-4 MCTS without cooperation level information
This policy uses the same algorithm as the previous policy but does not have access to information about the cooperation level. Instead, it makes an assumption on the driver cooperation level. We evaluated three different assumptions:
- •
: assumes drivers are always cooperative.
- •
: assumes a cooperation level of , drivers on the main lane will react to the merging vehicle only if it will reach the merge point in twice less time. This behavior is a middle ground between non cooperative and cooperative drivers.
- •
: assumes drivers are never cooperative. It is equivalent to assuming that the drivers on the main lane follows IDM and are blind to the merging vehicle.
V-5 MCTS with cooperation level information
This policy is using Monte Carlo tree search. To handle the continuous state space we used double progressive widening [22]. We used a computation budget matching the decision frequency of . The exact model of the environment is used to produce the tree, and the root node contains all the information about the cooperation levels. It is referred to as MCTS (FO). We refer the reader to [13] for background on the MCTS algorithm.
The first three approaches are offline deep reinforcement learning algorithms where as the last two are online planning methods.
VI Results and Discussion
An example of behavior learned through RL is illustrated in fig. 4. The ego vehicle first slows down as it approaches a dense traffic. Once a cooperative driver is detected (13\text{,}\mathrm{s}\text{/}) the vehicle slowly merges. At $t=$18\text{\,}\mathrm{s}\text{/} the ego vehicle has merged and follows the traffic on the main lane.
Figure 5 illustrates the performance of the evaluate policy on a dense traffic scenario. We can see from the percentage of collisions that MCTS (FO) is the safest policy, followed by the belief RL approach. The three other MCTS policies had a lot of collisions (much larger than ). The RL policy without access to information on the cooperation level had collisions at test time and the two other RL policies performed similarly with around . Previous works has shown that only relying on deep RL is not sufficient to achieve safety [12]. The deployment of those policies would require the addition of a safety mechanism. It is important to notice that even though they did not have access to the full state, the RL and belief RL policy have much better safety performance than the MCTS approaches that did not have access to this information either.
Regarding the number of time steps to cross, we notice that the MCTS policy with is the most efficient. The policy is biased towards taking more aggressive actions since it is assuming that every driver is cooperative. On the contrary, the MCTS policy assuming that no driver is cooperative () has a very conservative behavior: it takes the longest time and has the largest number of time-out failures. As expected MCTS () has a performance in between the previous two. The RL policies are more efficient than the MCTS (FO) policy and have an average time to cross close to the most aggressive MCTS policy. In addition, we can see that the RL policies have much fewer time-out failures than the MCTS policies. This last fact illustrates that they were able to successfully infer and leverage the information on the cooperation level.
One can notice that the gap in performance between MCTS (FO) and the other MCTS is much larger than the gap between RL and RL (FO). A possible explanation is that the neural network approximation is able to capture the cooperation level inference task in the hidden layers. However, this implicit state estimation is less efficient than the explicit state estimation provided by combining the belief updater with the RL policy during training and execution. The belief RL policy has a similar safety level and takes, on average, a similar number of steps than the RL policy with perfect observation.
MCTS with full observation still presents a significant number of time-out failures. We believe that relaxing the computation constraint would have lead to better performance. Computation is generally the major bottleneck of online planning algorithms. Our experiments show that the performance can be closely matched using offline trained policies which take a very short time to execute online. However, the MCTS policy with full observation is safer than the deep RL equivalent. A direction for future work is to use the RL policy as a value estimator to guide the search in MCTS.
VII Conclusion
We presented a reinforcement learning approach to the problem of autonomously merging in dense traffic. Our study confirms that an autonomous agent can benefit from reasoning about the interaction with other drivers. We have shown that an agent trained using deep reinforcement learning can outperform online planning methods when being exposed to a broad range of driver behaviors during training. In addition, we presented a belief state RL procedure that explicitly tries to estimate the internal state of other drivers. Finally, we proposed, C-IDM, a simple parametric model capturing a variety of cooperative behaviors.
Future work involves using more sophisticated techniques to estimate driver behavior. Other algorithms to learn belief state policies could be considered, as well as a direct comparison with online POMDP solvers [3]. Although our RL agent learned more efficient policies, an online planner may provide greater robustness. Using deep reinforcement learning policies to guide the search of a classical planner may be a promising direction.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Peter Trautman and Andreas Krause “Unfreezing the robot: Navigation in dense, interacting crowds” In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2010, pp. 797–803 DOI: 10.1109/IROS.2010.5654369 · doi ↗
- 2[2] Erik Ward, Niclas Evestedt, Daniel Axehill and John Folkesson “Probabilistic Model for Interaction Aware Planning in Merge Scenarios” In IEEE Transactions on Intelligent Vehicles 2.2 , 2017, pp. 133–146 DOI: 10.1109/TIV.2017.2730588 · doi ↗
- 3[3] Constantin Hubmann et al. “A Belief State Planner for Interactive Merge Maneuvers in Congested Traffic” In IEEE International Conference on Intelligent Transportation Systems (ITSC) , 2018, pp. 1617–1624 DOI: 10.1109/ITSC.2018.8569729 · doi ↗
- 4[4] Edward Schmerling, Karen Leung, Wolf Vollprecht and Marco Pavone “Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction” In IEEE International Conference on Robotics and Automation (ICRA) , 2018, pp. 1–9 DOI: 10.1109/ICRA.2018.8460766 · doi ↗
- 5[5] Jaime F. Fisac et al. “Hierarchical Game-Theoretic Planning for Autonomous Vehicles” In IEEE International Conference on Robotics and Automation (ICRA) , 2019
- 6[6] Shray Bansal, Akansel Cosgun, Alireza Nakhaei and Kikuo Fujimura “Collaborative Planning for Mixed-Autonomy Lane Merging” In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2018, pp. 4449–4455 IEEE
- 7[7] Zachary N. Sunberg, Christopher J. Ho and Mykel J. Kochenderfer “The value of inferring the internal state of traffic participants for autonomous freeway driving” In American Control Conference (ACC) , 2017, pp. 3004–3010 DOI: 10.23919/ACC.2017.7963408 · doi ↗
- 8[8] Chiyu Dong, John M. Dolan and Bakhtiar Litkouhi “Intention estimation for ramp merging control in autonomous driving (in review)” In IEEE Intelligent Vehicles Symposium (IV) , 2017, pp. 1584–1589 DOI: 10.1109/IVS.2017.7995935 · doi ↗
