The AI Driving Olympics at NeurIPS 2018
Julian Zilly, Jacopo Tani, Breandan Considine, Bhairav Mehta, Andrea, F. Daniele, Manfred Diaz, Gianmarco Bernasconi, Claudio Ruch, Jan Hakenberg,, Florian Golemo, A. Kirsten Bowser, Matthew R. Walter, Ruslan Hristov, Sunil, Mallya, Emilio Frazzoli, Andrea Censi, Liam Paull

TL;DR
The paper introduces the AI Driving Olympics, a robotics competition designed to evaluate machine learning approaches for autonomous driving tasks, highlighting current challenges and the need for better benchmarks and simulation-to-reality transfer methods.
Contribution
It presents a new competition framework for evaluating AI in robotics, including tools, tasks, and evaluation methods, to advance research in autonomous driving and navigation.
Findings
Need for improved benchmarks in robotics
Challenges in sim-to-real transfer
Initial results show gaps in current approaches
Abstract
Despite recent breakthroughs, the ability of deep learning and reinforcement learning to outperform traditional approaches to control physically embodied robotic agents remains largely unproven. To help bridge this gap, we created the 'AI Driving Olympics' (AI-DO), a competition with the objective of evaluating the state of the art in machine learning and artificial intelligence for mobile robotics. Based on the simple and well specified autonomous driving and navigation environment called 'Duckietown', AI-DO includes a series of tasks of increasing complexity -- from simple lane-following to fleet management. For each task, we provide tools for competitors to use in the form of simulators, logs, code templates, baseline implementations and low-cost access to robotic hardware. We evaluate submissions in simulation online, on standardized hardware environments, and finally at the…
| DARPA Buehler et al. (2007) | KITTI Geiger et al. (2012) | Robocup Kitano et al. (1997) | RL comp. Kidziński et al. (2018) | HuroCup Baltes et al. (2017) | AI-DO | |
|---|---|---|---|---|---|---|
| Accessibility | - | ✓ | ✓ | ✓ | ✓ | ✓ |
| Resource constraints | ✓ | - | ✓ | - | ✓ | |
| Modularity | - | - | - | - | - | |
| Simulation/Realism | ✓ | ✓ | ✓ | - | ✓ | |
| ML compatibility | - | ✓ | - | ✓ | - | ✓ |
| Embodiment | ✓ | - | ✓ | - | ✓ | ✓ |
| Diverse metrics | ✓ | - | ✓ | - | ✓ | |
| Reproducible experiments | - | ✓ | - | ✓ | - | ✓ |
| Automatic experiments | - | ✓ | - | ✓ | - |
| Characteristic | Description and Implementation in AI-DO |
|---|---|
| Accessibility | No up front costs other than the option of assembling a “Duckiebot” are required. The Duckiebot comes with step-by-step instructions including online tutorials, a forum, and detailed descriptions within the open-source Duckiebook Project (2019a). |
| Resource constraints | Constraints on power, computation, memory, and actuator limits play a vital role within robotics. In particular, we seek to compare contestants with access to different computational substrates such as a Raspberry PI 3 or a Movidius stick with more or less computational speed. Following this agenda, we want to explore which method is best given access to a fast computer, which is best using a slower computer, or which is best with more or less memory, etc. |
| Modularity | More often than not, robotics pipelines can be decomposed into several modules. The optimal division of a pipeline into modules however is undecided. In future editions of AI-DO we will make such modules explicit and test them according to performance and resource consumption such as computational load and memory. |
| Simulation/realism | The AI-DO is an ideal setup to test sim-to-real transfer wince both simulation and real environment are readily available for testing and the software infrastructure enables toggling between the two. |
| ML compatibility | Logged data from hours of Duckiebot operations are available to allow for training of ML algorithms. We also provide an OpenAI Gym Brockman et al. (2016) interface for both the simulator and real robot, which enables easy development of reinforcement learning algorithms. |
| Embodiment | The AI-DO finals are run in a real robot environment (Duckietown). |
| Diverse metrics | Each challenge of AI-DO employs specific objective metrics to quantify success, such as: traffic law compliance metrics to penalize illegal behaviors (e.g., not respecting the right-of-way); performance metrics such as the average speed, to penalize inefficient solutions and comfort metrics to penalize unnatural solutions that would not be comfortable to a passenger. These metrics are based on measurable quantities such as speed, timing and detecting traffic violations. Consequently, there may be multiple winners in each competition (e.g., a very conservative solution, a more adventurous one, etc.). |
| Reproducible experiments | The overall objective is to design experiments that can easily be reproduced, similar to recent trends in RL Henderson et al. (2018). We facilitate reproducibility by running available code on standard hardware and containerization of software. |
| Automatic experiments | Embodiment makes the problem of standardized evaluations particularly challenging. The ability to automatically conduct controlled experiments and perform evaluation will help to address these challenges. To that end, we are building intelligent Duckietowns called Robotariums Pickem et al. (2017) to automate the deployment and evaluation of autonomy algorithms on standardized hardware. |
| Number | Task | Component | Description |
|---|---|---|---|
| 1 | Sensing | Fish-eye Camera | FOV (front-facing), @30Hz |
| 2 | Computation | Raspberry Pi 3B+ | Quad Core 1.4 GHz, 64 bit CPU and 1 GB of RAM |
| 3 | Actuation | DC Motors | Independent, in differential drive configuration |
| 4 | Communication | Addressable LEDs | Front, Back for inter-bot communication (blinking) |
| 5 | Power | Power Bank | 10000 mAh battery (h operation) |
| Contestant | Round 1 | Round 2 | Round 3 | Round 4 | Round 5 | Cumulative |
|---|---|---|---|---|---|---|
| Wei Gao | 3.3 / 1 | 3.9 / 1 | 2.0 / 1 | 23.0 / 12 | 5.0 / 3 | 37 / 18 |
| SAIC Moscow | 6.0 / 3 | 2.0 / 1 | 2.0 / 1 | 3.0 / 1 | 2.0 / 1 | 15 / 7 |
| Team JetBrains | 16.0 / 2 | 1.0 / 0 | 4.0 / 1 | 0.0 / 0 | 8.0 / 1 | 29 / 4 |
| Jon Plante | 18.0 / 2 | 1.0 / 0 | 7.0 / 3 | 3.0 / 1 | 5.0 / 3 | 34 / 9 |
| Vincent Mai | 2.0 / 1 | 1.0 / 0 | 3.0 / 2 | 14.0 / 1 | 3.0 / 2 | 23/ 9 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: 1ETH Zürich, Zürich, Switzerland
2Mila, Université de Montréal, Montréal, Canada
3nuTonomy, an Aptiv company, Boston, USA
4Toyota Technological Institute at Chicago, Chicago, IL, USA
5Duckietown Foundation, Boston, MA, USA
6Amazon Web Services, San Francisco, USA
The AI Driving Olympics at NeurIPS 2018
Julian Zilly1
Jacopo Tani1
Breandan Considine2
Bhairav Mehta2
Andrea F. Daniele4
Manfred Diaz2
Gianmarco Bernasconi1
Claudio Ruch1
Jan Hakenberg1
Florian Golemo2
A. Kirsten Bowser5
Matthew R. Walter4
Ruslan Hristov3
Sunil Mallya6
Emilio Frazzoli1,3
Andrea Censi1,3
Liam Paull2
Abstract
Despite recent breakthroughs, the ability of deep learning and reinforcement learning to outperform traditional approaches to control physically embodied robotic agents remains largely unproven. To help bridge this gap, we created the “AI Driving Olympics” (AI-DO), a competition with the objective of evaluating the state of the art in machine learning and artificial intelligence for mobile robotics. Based on the simple and well specified autonomous driving and navigation environment called “Duckietown,” AI-DO includes a series of tasks of increasing complexity – from simple lane-following to fleet management. For each task, we provide tools for competitors to use in the form of simulators, logs, code templates, baseline implementations and low-cost access to robotic hardware. We evaluate submissions in simulation online, on standardized hardware environments, and finally at the competition event. The first AI-DO, AI-DO 1, occurred at the Neural Information Processing Systems (NeurIPS) conference in December 2018. The results of AI-DO 1 highlight the need for better benchmarks, which are lacking in robotics, as well as improved mechanisms to bridge the gap between simulation and reality.
Keywords Competition, robotics, safety-critical AI, self-driving cars, autonomous mobility on demand, Duckietown.
1 Introduction
Competitions provide an effective means to advance robotics research by making solutions comparable and results reproducible Behnke (2006) as without a clear benchmark of performance, progress is difficult to measure. The overwhelming majority of benchmarks and competitions in machine learning (ML) do not involve physical robots. Yet, the interactive embodied setting is thought to be an essential scenario to study intelligence Pfeifer and Scheier (2001); Floreano et al. (2004). Robots are generally built as a composition of blocks (perception, estimation, planning, control, etc. ), and an understanding of the interaction between these different components is necessary to make robots achieve tasks. However, it is currently unclear whether these abstractions are necessary within ML methods, or whether it is possible to learn them in an end-to-end fashion.
In order to evaluate ML solutions on physically embodied systems, we developed a competition using the robotics platform Duckietown - a miniature town on which autonomous robots navigate Paull et al. (2017). This competition is aimed as a stepping stone to understand the role of AI in robotics in general, and in self-driving cars in particular. We call this competition the “AI Driving Olympics” (AI-DO) because it comprises a set of different trials that correspond to progressively sophisticated vehicle behaviors, from the reactive task of lane-following to more complex and “cognitive” behaviors, such as dynamic obstacle avoidance, to finally coordinating a vehicle fleet while adhering to a set of “rules of the road.” The first AI-DO event (AI-DO 1) took place as a live competition at the 2018 Neural Information Processing Systems (NeurIPS) conference in Montréal, Canada. AI-DO 2 will take place at the 2019 International Conference on Robotics and Automation (ICRA), again in Montréal, Canada.
Here, we describe our overall design objectives for the AI-DO (Sec. 2). Subsequently, we detail AI-DO 1, describe the approaches taken by the top competitors (Sec. 4) and then conduct a critical analysis of which aspects of AI-DO 1 worked well and which did not (Sec. 5). We then focus our attention on the planned improvements for AI-DO 2 (Sec. 6) and provide some general conclusions (Sec. 7).
2 The Purpose of AI-DO
Robotics is often cited as a potential target application in ML literature (e.g., Chaplot et al. (2018); Higgins et al. (2017) and many others). The vast majority of these works evaluate their results in simulation Kidziński et al. (2018) and on very simple (usually grid-like) environments Sutton and Barto (2018). However, simulations are by design based on what we already know and lack some of the surprises the real world has to offer. Our experience thus far indicates that many of the implicit assumptions made in the ML community may not be valid for real-time physically embodied systems. For instance, considerations related to resource consumption, latency, and system engineering are rarely considered but are crucially important for fielding real robots. Likewise, relying on a single metric to judge behavior (cf. reinforcement learning) is not realistic in more complex tasks. Training agents in simulation has become very popular, but it is largely unknown how to assess whether knowledge learned in simulation will transfer to a real robot.
To some extent, ML practitioners can be forgiven for not testing their algorithms on real robot hardware since it is time-consuming and often requires specific expertise.
Furthermore, running experiments in an embodied setting generally limits: the amount of data that can be gathered, the ease of reproducing previous results, and the level of automation that can be expected of the experiments. To this end, we designed AI-DO as a benchmark that is both trainable efficiently in simulation and testable on standardized robot hardware without any robot-specific knowledge.
What makes the AI-DO unique is that, of the many existing competitions in the field of robotics, none possess the essential characteristics that help facilitate the development of learning from data to deployment. A comparative analysis of these characteristics is given in Table 1 with the details for AI-DO given in Table 2.
Ideally, the ML community would redirect its efforts towards physical agents acting in the real world and help elucidate the unique characteristics of embodied learning in the context of robotics. Likewise, the robotics community should devote more effort to the use of ML techniques where applicable. The guaranteed impact is the establishment of a standardized baseline for comparison of learning-based and classical robotics approaches.
3 AI-DO 1: Competition Description
In the following, we describe the first version of the AI Driving Olympics (AI-DO). In particular we detail:
- •
The different competition challenges and their associated scoring metrics;
- •
The physical Duckietown platform;
- •
The software infrastructure including simulators and logs, the Docker containerization infrastructure, and various baseline implementations.
3.1 The Challenges
AI-DO 1 comprised three challenges with increasing order of difficulty:
- LF
Lane following on a continuous closed course, without obstacles. The robot was placed on a conforming closed track (with no intersections) and was required to follow the track in the right-hand lane. 2. LFV
Lane following on a continuous closed course as above, but with static obstacles (e.g., vehicles) sharing the road. 3. AMOD
Autonomous mobility on demand:111AMOD competition website https://www.amodeus.science/ Participants were required to implement a centralized dispatcher that provided goals to individual robots in a fleet of autonomous vehicles in order to best serve customers requesting rides. Low-level control of the individual vehicles was assumed.
3.2 Metrics and Judging
The rules for AI-DO 1 are described below and can be found in more detail online.222The performance rules of AI-DO http://docs.duckietown.org/DT18/AIDO/out/aido_rules.html
Lane following with and without obstacles (LF/LFV)
While not treated separately in AI-DO 1 for the lane following (LF) and lane following with obstacles (LFV) task, we considered the following metrics, each evaluated over five runs to reduce variance.
Traveled distance: The median distance traveled along a discretized lane. Going in circles or traveling in the opposite lane does not affect the distance.
Survival time: The median duration of an episode, which terminates when the car navigates off the road or collides with an obstacle.
Lateral deviation: The median lateral deviation from the center line.
Major infractions: The median of the time spent outside of the driveable zones. For example, this penalizes driving in the wrong lane.
The winner was determined as the user with the longest traveled distance. In case of a tie, the above metrics acted as a tie-breaker such that a longer survival time as well as a low lateral deviation and few infractions were rewarded.
Autonomous mobility on demand (AMOD)
The autonomous mobility on demand challenge assumed that vehicles were already following the rules-of-the-road and were driving reasonably. The focus here was on deciding where and when to send each vehicle to pick up and deliver customers. To this end, we used three different metrics to evaluate performance.
Service quality: In this arrangement, we penalized large waiting times for customers as well as large distances driven around with empty vehicles. We weighted the two components such that waiting times dominated the metric.
Efficiency: Again, we penalized large waiting times for customers as well as large distances driven around with empty vehicles. In this case, we weighted the two components such that the distance traveled by empty vehicles dominated the metric.
Fleet size: This scenario considered the true/false case of whether a certain service level could be upheld with a chosen fleet size or not. The smaller the fleet the better, however only as long as service requirements were fulfilled.
3.3 The Physical Competition Infrastructure
The physical Duckietown platform is comprised of intelligent vehicles (Duckiebots) and a customizable model of urban environment (Duckietown).
The robot - Duckiebot:
Table 3 lists the main components of the Duckiebot robot, which are visualized in Fig. 3.
We carefully chose these components in order to provide a robot capable of advanced single- and multi-agent behaviors, while maintaining an accessible cost. We refer the reader to the Duckiebook Project (2019a) for further information about the robot hardware.
The environment - Duckietown:
Duckietowns are modular, structured environments built on road and signal layers (Fig. 4) that are designed to ensure a reproducible driving experience.
A town consists of six well defined road segments: straight, left and right turns, a 3-way intersection, a 4-way intersection, and an empty tile. Each segment is built on individual interlocking tiles, that can be reconfigured to customize the size and topology of a city. Specifications govern the color and size of the lane markings as well as the road geometry.
The signal layer is comprised of street signs and traffic lights. In the baseline implementation, street signs are paired with AprilTags Olson (2011) to facilitate global localization and interpretation of intersection topologies by Duckiebots. The appearance specifications detail their size, height and location in the city. Traffic lights provide a centralized solution for intersection coordination, encoding signals through different frequencies of blinking LED lights.
These specifications are meant to allow for a reliable, reproducible experimental setup that would be less repeatable on regular streets. For more details about the components and the rules of a Duckietown, we refer the reader to the Duckietown operation manual Project (2019a).
3.4 The Software Infrastructure
The overarching design principles underlying our software architecture are the following:
- •
Minimize dependencies on specific hardware or operating systems for development, which we achieved via containerization with Docker;
- •
Standardize interfaces for ease of use, specifically to allow a participant to easily switch between developing driving solutions in simulation and deploying them on physical robots (either their own or in a Robotarium). This same interface is used to evaluate submissions;
- •
Provide interfaces to commonly used frameworks in both the ML (e.g., Tensorflow, PyTorch, and OpenAI Gym) and robotics (e.g., ROS) communities;
- •
Provide baseline implementations of algorithms commonly used in both ML (e.g., reinforcement learning, imitation learning) and robotics (see Paull et al. (2017) for details) communities;
- •
Provide tools to leverage cloud resources such as GPUs.
We use Docker containerization to standardize the components as well as inter-component communication. This has the important advantage that it allows us to deploy the same code in the simulator as well as on the real Duckiebot using an identical interface. Sec. 3.4 contains more details about our different container types and the interactions between them, while Fig. 5 depicts the evaluation architecture.
As mentioned above, in order to facilitate fast training with modern machine learning algorithms, we developed an OpenGL-based simulator “gym-duckietown” Chevalier-Boisvert et al. (2018). As the name suggests, the simulator provides an OpenAI gym-compatible interface Brockman et al. (2016) that enables the use of available implementations of state-of-the-art reinforcement learning algorithms. Sec. 3.4 discusses the simulation and its features in more detail.
Our architecture is unique in that it requires that all submissions be containerized. Rather than directly connecting to the simulator, each participant launches and connects to a Docker container that runs the simulation. In order to facilitate this interaction, we provide boilerplate code for making submissions and for launching the simulator. We also provide baseline autonomous driving implementations based upon imitation learning, reinforcement learning, as well as a classic ROS-based lane detection pipeline. Each of these three implementations provides basic driving functionality out-of-the-box, but also includes pointers to various improvements to the source code consistent with the goal that AI-DO be educational.
Once a user submits an agent container, our cluster of evaluation servers downloads and scores the submission which is elaborated in Sec. 3.4. During this process, the servers create various forms of feedback for the user. These include various numerical evaluation scores as discussed earlier (e.g., survival time, traffic infractions, etc.), plots of several runs overlaid on a map, animated GIFs that show the robot’s view at any given time, and log files that record the state of the system during these runs. The evaluation server uploads this data to a leaderboard server.
We provided the code and documentation for running the submissions on the physical Duckiebot. Originally, we intended for this to happen autonomously in the Robotarium. Since this was not completed before the first AI-DO live challenge, we instead “manually” ran the containers associated with the best simulation results on the physical systems. Specifically, we ran a Docker container on our Duckiebots that provided low-level control and perception with low latency while exposing to the local WiFi network the same ROS interface provided in the simulator. From a PC connected to the same local network, we downloaded the agent submissions (Docker images) and run it so that the agent container was able to control the Duckiebot over the network.
The Development Pipeline
The AI-DO development process was designed with two primary goals in mind. First, it should be easy for competitors to install and run prepared “baselines” (Sec. 3.4) for each of the challenges. Secondly, it should be easy to test and evaluate each submission in a controlled and reproducible manner. To do so, we implemented a command line utility for building and running solutions, and a set of containerization tools (Sec. 3.4), enabling users to quickly fetch the necessary dependencies and evaluate their submissions locally.
For lane-following, we provided the following baseline implementations: Classic ROS, reinforcement learning (RL) with PyTorch, and imitation learning (IL) with Tensorflow. All provided baselines ran unmodified in the simulator, and some (e.g., IL) were trained using logs recorded from human drivers. We also provided docker images for producing trained RL and IL models. Following internal testing and evaluation, we released baselines to the public during the competition period.
We encouraged competitors to use the following protocol for submission of their challenge solutions (Fig. 6). First, they should build and evaluate a baseline inside the simulator on their local development machine to ensure it was working properly. This evaluation results in a performance score and data logs that are useful for debugging purposes. If the resulting trajectory appeared satisfactory, competitors could then submit the Docker image to the official AI-DO evaluation server. The server assigned all valid submissions a numerical score and placed them on a public leaderboard.
While the submission of competition code via containerization was straightforward, one issue we encountered during internal development was related to updates to the Docker-based infrastructure (see Sec. 3.4). Originally, Docker images would be automatically rebuilt upon (1) changes to the mainline development branch and (2) changes to base layers, resulting in a cascade of changes to downstream Docker images. This was primarily due to two issues: (1) images were not version-pinned and (2) the lack of acceptance testing. Since users were automatically updated to the latest release, this would cause a significant disruption to the development workflow, where cascading failures were observed on a daily basis. To address this issue, we disabled auto builds and deployed manually, however a more complete solution would involve rolling out builds incrementally following a standard acceptance testing process. Due to a premature automation of the build pipeline, we were unable to utilize auto-builds to their full capabilities.
Simulation
While Duckietown’s software architecture allows for fast deployment on real robots, it is often easier and more efficient to test new ideas within simulated environments. Fast simulators are also important when developing and training reinforcement learning or imitation learning algorithms, which often require large amounts of data. Duckietown ships with a fast OpenGL-based simulator Chevalier-Boisvert et al. (2018) (Fig. 7) that incorporates realistic dynamic models and simulates various Duckietown maps within a purely Python framework. The simulator runs at hundreds of frames per second, and is fully customizable in all aspects that control it, including dynamics, environment, and visual fidelity.
The simulator provides several features that help model operation in the real Duckietown environment. These include the simulation of other agents including Duckiebots and pedestrian Duckies, as well as many of the obstacles found in physical Duckietowns, such as traffic cones and cement dividers. The simulator ships with multiple maps and a random map generator, enabling an easy way for participants to ensure that their agents do not overfit to various details of the environment. The simulator provides high-fidelity simulated camera images that model realistic fish-eye lens warping.
However, it is often the case that algorithms trained purely in simulation fall prey to the simulation-reality gap Jakobi et al. (1995) and fail to transfer to the real Duckiebot. This gap is particularly problematic for methods that reason over visual observations, which often differ significantly between simulation and the real environment. An increasingly common means of bridging the simulation-to-reality gap is to employ domain randomization Tobin et al. (2017), which randomizes various aspects of the simulator, such as colors, lighting, action frequency, and various physical and dynamical parameters, in the hope of preventing learned policies from overfitting to simulation-specific details that will differ after transfer to reality. The Duckietown simulator provides the user with hooks to dozens of parameters that control domain randomization, all with configurable ranges and settings through a JSON API.
Simulators are integral to the development process, particularly for learning-based methods that require access to large datasets. In order to ensure fairness in the development process, we also provide wrappers that allow traditional (non-learning-based) robotic architectures to benefit from the use of simulation. Participants are encouraged to try combinations of various approaches, and are able to quickly run any embodied task such as lane following (LF) and lane following with obstacles (LFV) within the simulator by changing only a few arguments. For the autonomous mobility on demand (AMOD) task, we provide a separate city-based fleet-level simulator Ruch et al. (2018).
Containerization
One of the challenges of distributed software development across heterogeneous platforms is the problem of variability. With the increasing pace of software development comes the added burden of software maintenance. As hardware and software stacks evolve, so too must source code be updated to build and run correctly. Maintaining a stable and well documented codebase can be a considerable challenge, especially in a robotics setting where contributors are frequently joining and leaving the project. Together, these challenges present significant obstacles to experimental reproducibility and scientific collaboration.
In order to address the issue of software reproducibility, we developed a set of tools and development workflows that draw on best practices in software engineering. These tools are primarily built around containerization, a widely adopted virtualization technology in the software industry. In order to lower the barrier of entry for participants and minimize variability across platforms (e.g. simulators, robotariums, Duckiebots), we provide a state-of-the-art container infrastructure based on Docker, a popular container engine. Docker allows us to construct versioned deployment artifacts that represent the entire filesystem and to manage resource constraints via a sandboxed runtime environment.
The Duckietown platform supports two primary instruction set architectures: x86 and ARM. To ensure the runtime compatibility of Duckietown packages, we cross-build using hardware virtualization to ensure build artifacts can be run on all target architectures. Runtime emulation of foreign artifacts is also possible, using the same technique.333For more information, this technique is described in further depth at the following URL: https://www.balena.io/blog/building-arm-containers-on-any-x86-machine-even-dockerhub/. For performance and simplicity, we only build ARM artifacts and use emulation where necessary (e.g., on x86 devices). On ARM-native, the base operating system is HypriotOS, a lightweight Debian distribution with built-in support for Docker. For both x86 and ARM-native, Docker is the underlying container platform upon which all user applications are run, inside a container.
Docker containers are sandboxed runtime environments that are portable, reproducible and version controlled. Each environment contains all the software dependencies necessary to run the packaged application(s), but remains isolated from the host OS and file system. Docker provides a mechanism to control the resources each container is permitted to access, and a separate Linux namespace for each container, isolating the network, users, and file system mounts. Unlike virtual machines, container-based virtualization like Docker only requires a lightweight kernel, and can support running many simultaneous containers with close to zero overhead. A single Raspberry Pi is capable of supporting hundreds of running containers.
While containerization considerably simplifies the process of building and deploying applications, it also introduces some additional complexity to the software development lifecycle. Docker, like most container platforms, uses a layered filesystem. This enables Docker to take an existing “image” and change it by installing new dependencies or modifying its functionality. Images may be based on a number of lower layers, which must periodically be updated. Care must be taken when designing the development pipeline to ensure that such updates do not silently break a subsequent layer as described earlier in Sec. 3.4.
One issue encountered is the matter of whether to package source code directly inside the container, or to store it separately. If source code is stored separately, a developer can use a shared volume on the host OS for build purposes. In this case, while submissions may be reproducible, they are not easily modified or inspected. The second method is to ship code directly inside the container, where any changes to the source code will trigger a subsequent rebuild, effectively tying the sources and the build artifacts together. Including source code alongside build artifacts also has the benefit of reproducibility and diagnostics. If a competitor requires assistance, troubleshooting becomes much easier when source code is directly accessible. However doing so adds some friction during development, which has caused competitors to struggle with environment setup. One solution is to store all sources on the local development environment and rebuild the Docker environment periodically, copying sources into the image.
Baselines
A important aspect of AI-DO is that participants should be able to quickly join the competition, without the significant up-front cost that comes with implementing methods from scratch. To facilitate this, we provided baseline “strawman” solutions, based both on “classical” and learning-based approaches. These solutions are fully functional (i.e., they will get a score if submitted) and contain all the necessary components, but should be easily beatable.
ROS baseline: We provided a simplified implementation of the classical Duckietown ROS-based baseline (Fig. 9 (left)), which consists of a simple lane detection algorithm that maps to wheel velocities. It is provided in order to show participants how to quickly use combinations of existing Duckietown software with their own enhancements. The ability to mix classical methods into a software pipeline enables participants to overcome various issues that often plague purely learning-based methods. For example, when training an end-to-end reinforcement learning policy in simulation, the policy may overfit to various visual simulation details, which upon transfer, will no longer be present. In contrast, running a learning algorithm on the top of the output of a classical module, such as a lane filter, will transfer more smoothly to the real world.
Imitation Learning from Simulation or Logs: Learning complex behaviors can be a daunting task if one has to start from scratch. Available solutions such as the above ROS-based baseline help to mitigate this. As one way to make use of this prior information, we are offering simple imitation learning baselines (behavioral cloning) that utilize on driving behavior datasets collected either in simulation or from recorded logs444Duckietown logs database: http://logs.duckietown.org/ as depicted on the right in Fig. 9. In our experience, policies learned using imitation learning within Duckietown are frequently very skilled at driving as long as sufficient and representative data is selected to train them.
Reinforcement Learning Baseline: For participants who want to explore reinforcement learning approaches within AI-DO, we supply a lane following baseline using the Deep Deterministic Policy Gradients (DDPG) method Lillicrap et al. (2015) trained in the Duckietown simulation environment. DDPG is an actor-critic method that can deal with continuous output spaces based on the Deep Q-learning architecture Mnih et al. (2015) originally developed for discrete action spaces. To aid in improving the baseline, we also provide a list of possible changes that participants can try to improve the baseline. The distinguishing factor of such a reinforcement learning solution is that, at least in principle, it has the chance to find unexpected solutions that may not easily be found through engineering insights (ROS baseline) or copying of existing behaviors (imitation learning baseline).
4 AI-DO 1: Details of Top Submissions
The first edition of the AI Driving Olympics included an initial online phase in which submission were evaluated in simulation. We received our first submissions for AI-DO 1 in October 2018. Participants competed for a spot in the final competition up until the day before the final competition on 9 December at NeurIPS 2018 in Montréal, Canada.
4.1 Qualifications
We received a total of around 2000 submissions from around 110 unique users, with the majority focused on the lane following (LF) task. Only a small number of participants considered the lane following with obstacles (LFV) and autonomous mobility on demand (AMOD) tasks. As a consequence, we limited the final competition to the lane following task and included the top five participants from the lane following simulation phase. These will be described in the following pages. We were interested in understanding the different approaches that the participants took and the various design decisions that they made to improve performance. Afterwards we will share our experiences at the live finals, where we saw the code of the finalists in action on real Duckiebots.
To put the competitor submissions into context, we ask the questions of what constitutes bad driving and what good driving is? As an example of challenges already present when driving in simulation refer to Fig. 10. Depicted is a bird’s-eye-view visualization of the driving path of six different submissions. We include this figure to demonstrate the spread of behaviors between the submissions of the finalists and another submission (left-top) that did not advance to the finals.
Contribution from Participant 1: Gao Wei (Team Panasonic R&D Center, Singapore & NUS)
Inspiration: This technique was inspired by human driving. A person may reason about the lane and potential curvature of the road, plan a trajectory and then execute the plan. This is the approach taken here with the agent first projecting the Duckiebot’s first-person view to a local ego-centric reconstruction of the lanes, objects and stop lines. Based upon the cumulative observations the method then predicts the curvature of the lane. Finally, an adaptive pure pursuit algorithm is run on the reconstructed local world and prediction.
Perception: The perception module processes images from the forward-facing camera to detect lane, objects and moving cars. To simplify the communication pipeline and the difficulty of detection, a semantic mask was designed as the API to transfer to prediction.
Prediction: The prediction module reconstructs the local 3D environment in real-time from the semantic mask given by the perception module. It transfers detections in the first-person view to a bird’s-eye view of the environment. The module uses a dynamics model of the Duckiebot to update the environment model as the robot navigates.
Control: The control module reasons over the predicted environment “map” to generate control inputs using a pure pursuit control algorithm.
Competition experience and lessons: In order to facilitate the development process, the participant modified the open-source simulator to access additional information beneficial for debugging, including the ground-truth center line location (Fig.12). Additionally, they created a custom tool to visualize the inferred egocentric map of the robot’s local environment.
Contribution from Participant 2: Jon Plante
Inspiration: This technique was developed to emulate the way in which a human might center a vehicle in a lane using the left and right lane boundaries as guides. More specifically, the driver may steer the vehicle towards the point at which the boundary lines interesect at the horizon. Alg. 1 provides an algorithmic description of this approach.
Image preprocessing: The pipeline (Fig. 13) begins by filtering input images to find the white and yellow lane markings by applying several steps detailed in Alg. 1. Following several preprocessing steps, the lanes are detected and projected into the horizon. Their point of intersection is used as the target direction and used to calculate an angle of deviation that is later used to control the robot.
Control: The desired steering angle and forward speed are read from a lookup table based on the current heading deviations. Generally, a smaller angle corresponded to greater speed, since vehicles need to slow down in curves. When no yellow line is detected on the left side of the image, the robot assumes that it is not in the right lane and thus switches to another controller that pushes the Duckiebot to realign itself by going gradually to the right.
Competition experience and lessons: The detailed preprocessing likely helped the Duckiebot drive around even in conditions that it was not originally tested on. However, during the competition, the approach might not have been flexible enough given the case-based controller. Likewise, lane markings were detected using averaging thresholds for white and yellow lines that are likely sensitive to the lighting conditions.
Contribution from Participant 3: Vincent Mai
Inspiration: The rationale of this approach is to tune a tried and tested solution without reinventing the wheel. The participant started with the ROS-based lane following baseline Paull et al. (2017). This method detects lane markings in the first-person images and uses these detections to estimate the Duckiebot’s distance to the centerline of the lane. This error is fed to a PID controller. The participant identified several weaknesses of this pipeline, and modified several parameters to improve performance.
Lane filter: The image processing pipeline is prone to erroneous detections, labeling yellow lines as white lines (near the end of tiles) and green grass as yellow lane markings. The participant addressed this by manually tuning the HSV color space parameters. The method also reduced the original extent of the image crop to provide observations of lane markings that were useful for estimating deviation from the center line.
Lane controller: The original setpoint of the PID controller positioned the Duckiebot too far to the right of the center line. The participant tuned this parameter to ensure a steady-state position closer to the middle of the right lane. In order to increase the Duckiebot’s speed, the participant used greater gains on the commands to the motors. Further, the participant tuned the controller’s derivative gain to reduce oscillations in the Duckiebot’s heading.
Competition experience and lessons: Starting with an existing baseline was helpful but also limited the design freedom of the method. Having access to a physical Duckiebot supported the tuning and development of this pipeline. However, the participants found that several tuned parameters were sensitive to the specific Duckiebot on which the pipeline was run as well as the environment. For example, the lighting conditions during the competition, which were different from those of the testing environment, introduced errors in lane detection. Additionally, the revised motor gains that were used to increase speed did not directly transfer to the Duckiebot used in the competition. The participant also found that control parameters tuned to optimize performance in simulation resulted in poor performance on the real robot.
Contribution from Participant 4: Team JetBrains —
Mikita Sazanovich, Oleg Svidchenko, Aleksandra Malysheva, Kirill Krinkin, Maksim Kuzmin, Aleksandr Semenov, Vadim Volodin, and Aleksei Shpilman
Inspiration: The team was interested in applying deep reinforcement learning to solve for a lane-following policy. Further, they wanted to use this as an opportunity to explore the feasibility of transferring policies trained in simulation to the real world.
Learning approach: The team used their framework for parallel deep reinforcement learning555Accessible online at https://github.com/iasawseen/MultiServerRL. The neural network consisted of five convolutional layers (the first layer contained 32 filters, and the remaining four contained 32 filters), followed by two fully connected layers (with 768 and 48 neurons, respectively). Input to the network consisted of the the last four frames downsampled to 120 by 160 pixels and filtered for white and yellow color as shown in Fig. 16. The network was trained using the Deep Deterministic Policy Gradient algorithm Lillicrap et al. (2015). The training was conducted in three stages: First, on a full track, then on the most problematic regions (identified by the largest cross-track error), and again on a full track. The reward had the following form: . Angles were extracted in the simulator during training, but are not necessary for inference since the method only relies on preprocessed camera frames.
Competition experience and lessons: As most finalist teams, Team JetBrain found sim-to-real transfer to be difficult. It is not clear whether or not reinforcement learning techniques are more susceptible to this problem, however there are heuristic control algorithms that can be calibrated for real-world Duckietown and perform quite well. The team plans to address this problem in future work, perhaps with a hybrid approach that uses reinforcement learning and machine learning not to devise the exact action, but to either preprocess environmental data or fine-tune parameters for standard control algorithms.
Contribution from Participant 5: Team SAIC Moscow—Vladimir Aliev, Anton Mashikhin, Stepan Troeshestov and Sergey Nikolenko
Inspiration: From the start of the challenge, the team decided to stick with an end-to-end learnable approach. The participants started with a simple Deep Deterministic Policy Gradient (DDPG) Lillicrap et al. (2015) baseline that predicted steering angle and velocity from the input images. However, they found that this approach gave rather poor results with the Duckiebot only moving forward and backward. The team experimented with exploration strategies, reinforcement learning algorithms, and training parameters. They converged on an original exploration scheme and the distributed Twin Delayed DDPG Fujimoto et al. (2018) learning algorithm.
Image preprocessing: Input images were resized to and converted to grayscale. Three consecutive frames were stacked along the channel dimension to provide temporal information.
Network Architecture: The method used a convolutional neural network with five convolutional layers followed by two fully connected layers. The feature sizes for the convolutional layers gradually increased from 8 to 64, where the first layers had stride 2. Fully connected layers had a hidden size of 300. The network inputs three-channel images and outputs the dot product between Duckiebot heading and center of the lane.
Exploration and learning: A key contribution was to implement guided exploration, as it was hard for the agent to come up with a meaningful strategy during random exploration for this task. The participants decided to add guidance using the default proportional controller provided by AI-DO. To train either proportional controller or trained network were randomly selected to act. This technique greatly reduced the time required to train the agent and significantly improved the final policy quality. The team implemented distributed off-policy training where agents interact with the simulator and collect experience independently and afterwards send it to shared storage. The training workers executed the Twin Delayed DDPG algorithm Fujimoto et al. (2018). After a number of iterations, parameters of the network were broadcasted to the workers.
Controller: The participants used a modified proportional controller to compute the motor commands. The controller took as input the dot product between the lane and robot heading as predicted by the network, and used a proportional term to drive the dot product close to one while moving at maximum speed. Although hardcoding the maximum speed may seem controversial, it worked well in simulation.
Competition experience and lessons: The team found the baseline algorithms that were provided to help in speeding up the development process. However, they felt that access to physical hardware is critical to developing a navigation algorithm, particularly with learning-based algorithms that require additional work to transfer from simulation. The team implemented simple augmentations and used domain randomization in the simulator, but it proved to be insufficient to transfer the trained policy to real-world inputs. The domain transfer requires additional future work to allow deploying end-to-end learning-based models on real devices.
4.2 Finals
As the peak of the competition, the finals let the best five contestants in simulation compete on real Duckiebots at NeurIPS 2018. The Duckiebots were placed in five starting positions spread out across the competition Duckietown. A judge assessed the performance of each robot in terms of the duration of time that it remained in the driveable area and the number of tiles that it traversed (Tab. 4). The best submission of each finalist was uploaded onto our competition computer, which streamed images from the Duckiebot via WiFi. The algorithm then computed the desired actions and sent them back to the Duckiebot.
Outcomes: Thanks to an exceptionally long run in round four, contestant Wei Gao won the AI-DO 1 live finals. Most runs however did not travel more than three tiles—likely due to differences between simulation and real Duckietown, such as lighting and the cheering crowd in the background during the competition. Additionally, as may be observed in Fig. 10, some of the finalist submissions appear to have overfit to the simulation environment. We conclude that methods developed in simulation may have considerable difficulty in reaching the performance of methods developed on physical hardware.
Awards: The finalists were awarded $3000 worth of Amazon web services (AWS) credits and a trip to the nearest nuTonomy (Aptiv) location for a ride in one of their self-driving cars.
5 AI-DO 1: Lessons Learned
Duckiebots were not the only entities that went through a learning process during AI-DO 1. Below we identify which aspects were beneficial and should be continued as part of subsequent competitions, and which should be revised. We separate the discussion into two categories: technical and non-technical.
5.1 Technical
The following reflects upon what we learned making and running AI-DO 1 from a technical point of view.
Successes
Software infrastructure
The use of Docker allowed us to run submission code on many different computational platforms and operating systems. A central and beneficial feature of Docker is that any submission is reproducible, it can still be downloaded, tried and tested today. Likewise, it was helpful that all submissions were evaluated on a server yet could also be run locally. This allowed participants to rapidly prototype solutions locally in an environment that was an exact functional match of the evaluation environment. As a result, competitors could do many local evaluations and only submit an entry when they were confident that they would surpass previous results, which alleviated stress on our server infrastructure.
Simulation
The custom simulator was used for initial benchmarking. Since it was developed in lightweight Python/OpenGL, it enabled rapid training (up to 1k frames/sec) for quick prototyping of ML-based solutions. Online evaluations and comprehensive visualizations666Any submission is visualized on https://challenges.duckietown.org/v3/ by clicking its submission number. were especially helpful in assessing performance and better understanding behavior. The fact that the Duckietown environment is so structured and well-specified, yet simple, allowed for easy generation of conforming map configurations in simulation with relatively little effort.
Baselines and templates
The containerized baselines and templates provided entry points that made it easy for competitors from different communities to begin working with the platform, essentially with just a few lines of code, as described in Sec. 3.4. These baselines made the competition accessible to roboticists and the ML communities alike, by providing interfaces to the standard tools of each. These learning resources were reinforced further through Facebook live events and a question forum that explained the AI-DO software in more detail.
To Be Improved
Software infrastructure
A large amount of effort was dedicated to the use of containerization via Docker. While containerization provides several benefits including reproducibility, the Docker infrastructure still poses challenges. For example, Docker changed versions in the middle of the competition wreaking temporary havoc to our dependencies.
An awareness of, and strategy for, dealing with resource constraints are essential to robotic systems. However, these constraints still have to be implemented using Docker. This gave an advantage to approaches that were computationally more expensive, when the inverse was desired. For example, a submission that ran on a Raspberry Pi should have been preferred over a submission that ran off-board on an expensive GPU.
Submission evaluation
All submissions were evaluated on one local server, which made the submission infrastructure fragile. There were instances when participants were unable to make submissions because the server was down or we had too many submissions entered at the same time. In the future, we will aim to move evaluations to the cloud to offset these issues.
Enhanced debugging and evaluation tools would have made it considerably easier to diagnose issues during the local evaluation phase. Users could be shown a richer set of diagnostics and logs from the simulator directly after their submission was evaluated, or from recorded playback logs of an on-Duckiebot trial.
All submissions were publicly visible. Competitors were able to download, view, and re-submit other images using the public hash code. While we are not aware of any participants accessing another team’s submissions during the competition, we intend to change the submission infrastructure to make the process more secure.
Furthermore, evaluations were not statistically significant since the simulations considered 20 episodes in only 1 map. Subsequent competitions will include more maps and more variability in the simulated environments as well as more evaluation episodes.
Bringing submissions tested in simulation to testing on real robots is a major component of AI-DO. While we provided ample opportunity for testing in simulation in AI-DO 1, there was insufficient opportunity for competitors to test on the hardware, which was apparent in the quality of the runs in the live event.
Simulation
While the finals of AI-DO 1 proved to be a good demonstration that the simulator-reality gap is large, we did not have a metric for quantifying it. The Duckietown codebase has had a working lane following solution since 2016 Paull et al. (2017), so we know that a robust solution to the LF challenge was feasible. However, in simulation this robust solution did not perform well. In contrast, solutions that did perform well in simulation did not work on the real hardware. Since most competitors did the majority of their development in the simulation environment, this resulted in sub-optimal performance at the live event.
We must provide better tools for competitors to be able to bridge the reality gap more easily. These tools should include: better API to modify and/or randomize parameters in the simulator, easier access to real robot logs, and easier access to real robot infrastructure either remotely (through robotariums) or locally.
Baselines and templates
Duplication in the boilerplate code resulted in diversity in the interface among various templates and baselines. We plan to reduce this duplication to make it easier to change protocols between different challenges.
Logs
Logs represent valuable data from which to train algorithms, yet we are unaware of any competitor that made use of the 16 hours of logs that were made available. Our conclusion was that the log database was not sufficiently advertised and that the workflow for using the logs in a learning context was not explicit.
5.2 Non-Technical
The following discusses the non-technical aspects of AI-DO 1.
Successes
Competition logistics
The Duckietown platform Paull et al. (2017) has proven to be an ideal setup for benchmarking algorithms as it is simple, highly structured and formally specified, yet modular and easily customised. It is also accessible to competitors as the robot and city hardware is readily available and inexpensive.
Available resources
The Duckietown platform contains a rich assortment of learning resources in the Duckiebook Project (2019a) and on the Duckietown website Project (2019b), as well as a large body of open-source software for running Duckiebots within Duckietown. The wealth of existing resources reduced the technical support demand on the organizers.
Engagement
Especially positive was the public’s emotional reception at the AI-DO 1 finals, as shown in Fig. 19-right. Both the Duckie-theme and competition were well-received. Instead of clapping between rounds the audience squeezed yellow rubber duckies.
To Be Improved
Similar to many larger projects, AI-DO 1 fell victim to the planning fallacy Buehler et al. (2019) wherein we were more optimistic in our timeline than our general experience should have suggested.
Competition logistics
The amount of setup required for the 1.5 hour event was significant. We required access to the competition space for the entire day and needed to move a significant amount of materials into the competition venue (see Fig. 19-left). Making this process smoother and simpler would be an important step towards making the competition easily repeatable.
From a broader perspective there was little community involvement in the development of the AI-DO infrastructure.
Rules
As the competition infrastructure and our understanding of the tasks evolved, the rules evaluating competitors changed too frequently and too close to the end of the competition. This may have made it harder to participate easily and improve consistently. It was surprisingly difficult to define the evaluation metrics in a way that promoted the kind of behavior we sought. Reinforcement learning algorithms are well-known for finding these types of loopholes. As one example, the evaluation of the “Traveled Distance” did not consider progress down the lane but only linear speed at first. As a result, it was possible to get a high score by spinning in circles as fast as possible (Fig. 20).
6 Next steps: AI-DO 2 at ICRA
Armed with the experience of the first version of the competition, we have set out to improve some of the key difficulties and to introduce some innovations.
Rules and evaluation
Early on in the preparation for AI-DO 2, we will fix the rules and how participants will be evaluated. Special care will be placed on using a simple evaluation procedure that nevertheless captures the essence of the approach.
A big change that we are in dire need of is the introduction of hidden tests that are a standard best practice in ML. There will thus be a test set that is not visible to any participant. Additionally, even in the visible evaluations, there will be more variability in submissions, such as different environments, lighting conditions or other types of randomization.
To facilitate the development of better driving approaches, we will make it easier to use logs of past driving behavior and to take logs and make them available.
Finally, we will investigate running submission on demand in the cloud which should increase our flexibility and, while we did not have any serious issues, make the AI-DO 2 submission procedure reliable.
Software infrastructure
Another step in improving the competition is to make use of the functionality of Docker to place resource constraints on submissions such as available computational speed and memory.
An innovative new aspect we will introduce are modules. Modules are functions that compute part of a larger pipeline and may be used in a plug-and-play fashion. We envision that these modules are themselves a “micro-competition” in the sense that better modules may lead to better submissions to the AI-DO tasks. Again, these modules would be benchmarked with respect to computational and memory requirements.
Simulation
The simulator will continue to be an integral part of the competition. Making it more realistic by providing better and more maps, more realistic physics and visualization and domain randomization will be decisive steps towards AI-DO 2.
As part of including resource constraints into the workflow, we plan to include two types of evaluation.
- •
Fixed-length time steps between actions independent of computational load
- •
Variable-length time steps as long as it takes to make a decision
Evaluation on actual hardware
The largest improvement for experimental evaluation will be the introduction of Robotariums. These are Duckietowns with automatic localization of Duckiebots and subsequent automatic scoring of their behavior. Having access to Robotariums is the key to giving participants valuable feedback on how their solutions work on real hardware.
Outreach and engagement
We hope to attract many more competitors. The guiding philosophy should be to limit the necessary work of participants to follow their own ideas while giving them the opportunity to venture beyond by contributing back to the Duckietown and AI-DO ecosystem.
7 Conclusion
We have presented an overview of the AI Driving Olympics. The competition seeks to establish a benchmark for evaluating machine learning and “classical” algorithms on real physical robots. The competition leverages the Duckietown platform: a low-cost and accessible environment for autonomous driving research and education. We have described the software infrastructure, tools, and baseline algorithm implementations that make it easy for robotics engineers and machine learning experts alike to compete.
The first instance of the AI-DO, AI-DO 1, was held in conjunction with NeurIPS in Dec. 2018. We have outlined some of the successes and weaknesses of this first experience. We will leverage this information to improve future competitions, starting with the AI-DO 2, which will be held at ICRA in 2019.
For science to advance we need a reproducible testbed to understand which approaches work and what their trade-offs are. We hope that the AI Driving Olympics can serve as a step in this direction in the domain of self-driving robotics. Above all we want to highlight the unique design trade-offs of embodied intelligence due to resource constraints, non-scalar cost functions and reproducible robotic algorithm testing.
Acknowledgements
We would like to thank NeurIPS and in particular Sergio Escalera and Ralf Herbrich for giving us the opportunity to share the AI Driving Olympics with the machine learning community. We are grateful to Amazon AWS and Aptiv for their sponsorship and hands-on help that went into this competition. We are also grateful to the many students in Montréal, Zurich, Taiwan, Boston, Chicago, and many others that have shaped Duckietown and AI-DO into what they are today.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Baltes et al. [2017] Jacky Baltes, Kuo-Yang Tu, Soroush Sadeghnejad, and John Anderson. Huro Cup: competition for multi-event humanoid robot athletes. The Knowledge Engineering Review , 32, 2017.
- 2Behnke [2006] Sven Behnke. Robot competitions-ideal benchmarks for robotics research. In Proc. of IROS-2006 Workshop on Benchmarks in Robotics Research . Institute of Electrical and Electronics Engineers (IEEE), 2006.
- 3Brockman et al. [2016] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
- 4Buehler et al. [2007] Martin Buehler, Karl Iagnemma, and Sanjiv Singh. The 2005 DARPA grand challenge: the great robot race , volume 36. Springer, 2007.
- 5Buehler et al. [2019] Roger Buehler, Dale Griffin, and Michael Ross. Inside the planning fallacy: The causes and consequences of optimistic time predictions. In Gilovich, Griffin, and Kahneman , 02 2019. doi: 10.1017/CBO 9780511808098.016 .
- 6Chaplot et al. [2018] Devendra Singh Chaplot, Emilio Parisotto, and Ruslan Salakhutdinov. Active Neural Localization. In International Conference on Learning Representations , 2018. URL https://openreview.net/forum?id=ry 6-G_66b .
- 7Chevalier-Boisvert et al. [2018] Maxime Chevalier-Boisvert, Florian Golemo, Yanjun Cao, Bhairav Mehta, and Liam Paull. Duckietown environments for openai gym. https://github.com/duckietown/gym-duckietown , 2018.
- 8Floreano et al. [2004] Dario Floreano, Francesco Mondada, Andres Perez-Uribe, and Daniel Roggen. Evolution of embodied intelligence. In Embodied artificial intelligence , pages 293–311. Springer, 2004.
