TL;DR
This paper reviews methods for communicating robot motion intent in human-robot interaction, proposing a unifying model to understand and classify existing approaches and guide future research.
Contribution
It introduces an intent communication model categorizing intent type, information, and location, unifying diverse research efforts in robot motion intent communication.
Findings
Proposes a comprehensive intent communication model.
Classifies existing research along the model.
Identifies key patterns and future directions.
Abstract
Robots are becoming increasingly omnipresent in our daily lives, supporting us and carrying out autonomous tasks. In Human-Robot Interaction, human actors benefit from understanding the robot's motion intent to avoid task failures and foster collaboration. Finding effective ways to communicate this intent to users has recently received increased research interest. However, no common language has been established to systematize robot motion intent. This work presents a scoping review aimed at unifying existing knowledge. Based on our analysis, we present an intent communication model that depicts the relationship between robot and human through different intent dimensions (intent type, intent information, intent location). We discuss these different intent dimensions and their interrelationships with different kinds of robots and human roles. Throughout our analysis, we classify the…
| Rank | Term | TF | IDF | TF-IDF | Rank | Term | TF | IDF | TF-IDF |
|---|---|---|---|---|---|---|---|---|---|
| 1 | human | 6,547 | 0.92 | 6,052.89 | 7 | interaction | 3,383 | 1.33 | 4,515.61 |
| 2 | control | 6,769 | 0.87 | 5,902.24 | 15 | movement | 1,920 | 1.88 | 3,606.34 |
| 3 | system | 7,612 | 0.69 | 5,218.61 | 61 | communicat | 1,059 | 2.32 | 2,455.03 |
| 4 | motion | 3,640 | 1.42 | 5,154.59 | 140 | feedback | 665 | 2.74 | 1,820.08 |
| 5 | model | 3,978 | 1.24 | 4,938.74 | 143 | visual | 674 | 2.67 | 1,802.90 |
| Category | Subcategory | Number of | References | |
|---|---|---|---|---|
| Papers (%) | Intents (%) | |||
| Motion | Robot Self-Actions | 38 (49.35%) | 75 (43.60%) | (Dragan and Srinivasa, 2013; Andersen et al., 2016; Bolano et al., 2021, 2018; Cakmak et al., 2011; Capelli et al., 2019; Chadalavada et al., 2020; Chakraborti et al., 2018; Che et al., 2020; Cleaver et al., 2021; Coovert et al., 2014; Correa et al., 2010; Domonkos et al., 2020; Dragan et al., 2015; Fischer et al., 2016; Frederiksen and Stoey, 2019; Gielniak and Thomaz, 2011; Gruenefeld et al., 2020; Gu et al., 2021; He et al., 2020; Hetherington et al., 2021; LeMasurier et al., 2021; Matsumaru, 2006, 2007; Matsumaru et al., 2005; Matsumaru et al., 2006; Mikawa et al., 2018; Rosen et al., 2019; Ruffaldi et al., 2016; Szafir et al., 2015; Tsamis et al., 2021; Walker et al., 2018; Watanabe et al., 2015; Wengefeld et al., 2020; Zahray et al., 2020; Zolotas and Demiris, 2019; Bodden et al., 2018; Johannsen, 2002) |
| World Actions | 15 (19.48%) | 18 (10.47%) | (Andersen et al., 2016; Aubert et al., 2018; Chakraborti et al., 2018; Chen et al., 2011; Faria et al., 2021; Faria et al., 2017; Han et al., 2020; Holladay et al., 2014; Kebüde et al., 2018; Kirchner et al., 2011; Lee Chang et al., 2018; Moon et al., 2014; Newbury et al., [n.d.]; Palinko et al., 2020; Psarakis et al., 2022) | |
| Attention | Robot-Focused Attention | 6 (7.79%) | 8 (4.65%) | (Aubert et al., 2018; Bolano et al., 2018; Cha and Mataric, 2016; Che et al., 2018; Furuhashi et al., 2015; Koay et al., 2013) |
| World-Focused Attention | 4 (5.19%) | 5 (2.91%) | (Levillain et al., 2019; Mutlu et al., 2009; Song and Yamada, 2018b; Staudte and Crocker, 2009) | |
| State | Robot Self-Perception | 23 (29.87%) | 27 (15.70%) | (Andersen et al., 2016; Baraka et al., 2016; Cauchard et al., 2016; Collins et al., 2015; Correa et al., 2010; Duncan et al., 2018; Fletcher et al., 2021; Gu et al., 2021; Levillain et al., 2019; Matsumaru, 2006, 2007; Novitzky et al., 2012; Sharma et al., 2013; Song and Yamada, 2018c; Szafir et al., 2014; Takayama et al., 2011; Tang et al., 2019; Walker et al., 2018; Wengefeld et al., 2020; Zolotas and Demiris, 2019; Johannsen, 2002; Zhou et al., 2017; Bacula et al., 2020) |
| Robot World Perception | 8 (10.39%) | 12 (6.98%) | (Andersen et al., 2016; Chakraborti et al., 2018; Coovert et al., 2014; Correa et al., 2010; Han et al., 2020; Ruffaldi et al., 2016; Wengefeld et al., 2020; Zolotas and Demiris, 2019) | |
| Instruction | Robot-Centered Instructions | 10 (12.99%) | 16 (9.30%) | (Baraka et al., 2016; Cha and Mataric, 2016; Faria et al., 2016; Furuhashi et al., 2015; Glas et al., 2007; Koay et al., 2013; Levillain et al., 2019; Mullen et al., 2021; Song and Yamada, 2018a; Tang et al., 2019) |
| World-Centered Instructions | 9 (11.69%) | 11 (6.40%) | (Andersen et al., 2016; Baraka et al., 2016; Bolano et al., 2021; Cakmak et al., 2011; Chakraborti et al., 2018; Chandan et al., [n.d.]; Moon et al., 2014; Psarakis et al., 2022; Wengefeld et al., 2020) | |
| Category | Subcategory | On-Human | On-World | On-Robot | ||
| Head-Attached | Hand-Held | Robot-Only | Robot-Attached | |||
| (Spatial) Registered | Local | 35 (Gruenefeld et al., 2020; Rosen et al., 2019; Walker et al., 2018) | 3 (Correa et al., 2010; Watanabe et al., 2015) | 4 (Aubert et al., 2018; Bolano et al., 2018; Cleaver et al., 2021) | 22 (Dragan et al., 2015; Bodden et al., 2018; Cakmak et al., 2011) | 10 (Coovert et al., 2014; Hetherington et al., 2021; Wengefeld et al., 2020) |
| Directional | 3 (Gu et al., 2021; Ruffaldi et al., 2016; Walker et al., 2018) | 0 | 0 | 14 (Holladay et al., 2014; Mikawa et al., 2018; Moon et al., 2014) | 14 (Chadalavada et al., 2020; Hetherington et al., 2021; Matsumaru, 2007) | |
| (Spatial) Unregistered | Description | 0 | 1 (Correa et al., 2010) | 1 (Bolano et al., 2018) | 0 | 9 (Matsumaru, 2006; Staudte and Crocker, 2009; Wengefeld et al., 2020) |
| Symbol | 5 (Walker et al., 2018; Zolotas and Demiris, 2019) | 0 | 1 (Chandan et al., [n.d.]) | 14 (Glas et al., 2007; Koay et al., 2013; LeMasurier et al., 2021) | 5 (Andersen et al., 2016; Bacula et al., 2020; Song and Yamada, 2018a) | |
| Signal | 0 | 3 (Che et al., 2020, 2018; Mullen et al., 2021) | 2 (Aubert et al., 2018; Bolano et al., 2018) | 0 | 26 (Domonkos et al., 2020; Szafir et al., 2015; Tang et al., 2019) | |
| Total | 43 (25.00%) | 7 (4.07%) | 8 (4.65%) | 50 (29.07%) | 64 (37.21%) | |
| (Temporal) Discrete | 15 (Gu et al., 2021; Newbury et al., [n.d.]; Psarakis et al., 2022) | 4 (Che et al., 2018, 2020; Mullen et al., 2021) | 5 (Aubert et al., 2018; Bolano et al., 2018) | 19 (LeMasurier et al., 2021; Furuhashi et al., 2015; Gielniak and Thomaz, 2011) | 45 (Cha and Mataric, 2016; Faria et al., 2016; Zahray et al., 2020) | |
| (Temporal) Continuous | 28 (Chakraborti et al., 2018; Tsamis et al., 2021; Zolotas and Demiris, 2019) | 3 (Correa et al., 2010; Watanabe et al., 2015) | 3 (Bolano et al., 2018; Chandan et al., [n.d.]; Cleaver et al., 2021) | 31 (Dragan et al., 2015; Capelli et al., 2019; Cauchard et al., 2016) | 19 (Collins et al., 2015; Han et al., 2020; Matsumaru et al., 2005) | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
How to Communicate Robot Motion Intent: A Scoping Review
Max Pascher
Westphalian University of Applied SciencesGelsenkirchenGermany
University of Duisburg-EssenEssenGermany
,
Uwe Gruenefeld
University of Duisburg-EssenEssenGermany
,
Stefan Schneegass
University of Duisburg-EssenEssenGermany
and
Jens Gerken
Westphalian University of Applied SciencesGelsenkirchenGermany
(2023)
Abstract.
Robots are becoming increasingly omnipresent in our daily lives, supporting us and carrying out autonomous tasks. In Human-Robot Interaction, human actors benefit from understanding the robot’s motion intent to avoid task failures and foster collaboration. Finding effective ways to communicate this intent to users has recently received increased research interest. However, no common language has been established to systematize robot motion intent. This work presents a scoping review aimed at unifying existing knowledge. Based on our analysis, we present an intent communication model that depicts the relationship between robot and human through different intent dimensions (intent type, intent information, intent location). We discuss these different intent dimensions and their interrelationships with different kinds of robots and human roles. Throughout our analysis, we classify the existing research literature along our intent communication model, allowing us to identify key patterns and possible directions for future research.
intent, motion, robot, cobot, drone, survey
††journalyear: 2023††copyright: rightsretained††conference: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems; April 23–28, 2023; Hamburg, Germany††booktitle: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23), April 23–28, 2023, Hamburg, Germany††doi: 10.1145/3544548.3580857††isbn: 978-1-4503-9421-5/23/04††ccs: General and reference Surveys and overviews††ccs: Human-centered computing††ccs: Computer systems organization Robotics
1. Introduction
The field of Human-Computer Interaction (HCI) has moved beyond traditional user interfaces and interaction technologies. The omnipresence of Artificial Intelligence (AI) research and development requires our field to scrutinize the applicability of established design practices (Amershi et al., 2019; Shneiderman, 2022). Human interaction with AI is evolving away from being like operating a tool to being more like interacting with a partner, which is particularly interesting concerning Human-Robot Interaction (HRI) (Grudin, 2017). The area of HRI has been studied for a long time in HCI and, in particular, the CHI community (Noguchi and Tanaka, 2020; Arevalo Arboleda et al., 2021; Kim et al., 2020; Liu et al., 2011; Villanueva et al., 2021). For example, Arevalo Arboleda et al. (Arevalo Arboleda et al., 2021) and Villanueva et al. (Villanueva et al., 2021) investigated combining robots and Augmented Reality (AR) technology to enable intuitive teleoperation, while others have explored on-site control of robot swarms (Kim et al., 2020) and home robots (Liu et al., 2011) as well as communication of emotions and intentions to the human (Noguchi and Tanaka, 2020).
Robots are versatile, they can assist us in our workplaces, support us at home, and accompany us in public spaces (Bauer et al., 2008; Ajoudani et al., 2017; Mahdi et al., 2022). The applications of robots are manifold, significantly increasing human capabilities and efficiency (Galin and Meshcheryakov, 2020). While robots come in many forms, robotic arms in particular have been shown to be suitable for and adaptable to different use cases, such as production lines (Bragança et al., 2019) and domestic care (Pascher et al., 2021). Here, they are known as cobots who support their users in Activities of Daily Living (ADLs), such as eating and drinking, grooming, or activities associated with leisure time.
As robots have a physical form, they tend to move and operate in the same space as humans. With advances in the degree of autonomy allowing for effective close-contact interaction, there is a need for a shared understanding between humans and robots. While robotic research tackles this from a sensory and path planning perspective (e.g., human-aware navigation (Kruse et al., 2013)), the field of HCI (and HRI in particular) has been concerned with how humans may better understand robot behavior (Rosen et al., 2019; Walker et al., 2018; Bodden et al., 2018). The subtleties of human communication are usually lost in this context, and robotic behavior needs to be understood from its own frame of reference. Robots are not a monolithic entity; with the many different types come just as many unique ways of conveying information, which could lead to erroneous interpretations by their human counterpart. An added complication is the increasing number of close-contact situations that allow little time to recognize and correct errors. This has led to numerous research efforts in recent years to find ways for robots to effectively communicate their intentions to their users (Kragic et al., 2018). This includes the direct communication of planned movements in space (Gruenefeld et al., 2020), but also less obvious means, such as drawing a user’s attention to the robot (Koay et al., 2013), communicating the robot’s movement activity state (e.g., active or inactive due to failure) (Song and Yamada, 2018c), and facilitating human oversight by communicating their external perception of the world (Han et al., 2020).
While all of these examples are concerned with communicating robot motion intent, they differ tremendously in their methods and goals. Other researchers, such as Suzuki et al., have subsequently identified robot motion intent as an essential research area (Suzuki et al., 2022). But beyond further solution approaches, the field needs a common understanding of the concept of robot motion intent (i.e., what do we actually mean by intent, what are relevant intent dimensions, and how does the communication of robot motion intent influence the relationship between robot and human).
To this end, we conducted a scoping review of current approaches to communicate robot motion intent in the literature. Based on our findings, we introduce an intent communication model of motion intent, which depicts the relationship between robot and human through the means of different intent dimensions (intent type, intent information, and intent location; see 1d). We further discuss these different intent dimensions and their interrelationships with different kinds of robots and human roles. Throughout our analysis, we classify the existing research literature along our intent communication model to form a design space for communicating robot motion intent. Practitioners and researchers alike may further benefit from this work for the design and selection of specific mechanisms to communicate motion intent. We identify future research directions and current gaps, which are further highlighted in an interactive website that lists the papers and allows comparisons based on user-selected categories.111Interactive Data Visualization of the Paper Corpus (Chakraborti et al., 2023). https://rmi.robot-research.de, last retrieved February 29, 2024.
Our contribution is two-fold: 1) a survey contribution that includes our analysis and classification of previous literature as well as future research (cf. contribution from Wobbrock and Kientz (Wobbrock and Kientz, 2016)), and 2) a theoretical contribution that introduces an intent communication model and describes the relationship of its entities.
2. Background
In this section, we will illustrate the need for communicating robot motion intent and discuss the current understanding of the term, which provides the foundation for our scoping review.
Robot is an umbrella term that describes a miscellaneous collection of (semi-)automated devices with various capabilities, technologies, and appearances(Goodrich and Schultz, 2008). These cyber-physical systems are often differentiated by their Degrees-of-Freedom (DoF) or ability to move and manipulate their environment. In industrial assembly lines, robotic arms manipulate and weld heavy parts (Wang et al., 2019), often in restricted areas (Hentout et al., 2019). Enabled by lightweight materials and safety sensors, robots have started to adapt to their users – today, they shut down when humans get too close or when resistance to the robot’s movement is detected. This has led to the development of cell-less HRI (Bauer et al., 2016), which has also paved the way for further scenarios, such as supporting people with disabilities in their daily lives (Pascher et al., 2019). Ajoudani et al. trace in their review paper several approaches of HRI, how it evolved, and how it increased over the last two decades (Ajoudani et al., 2017). They conclude that the success of HRI comes from combining human cognitive skills (i.e., intelligence, flexibility, and ability to act in case of unexpected events) with the robot’s high precision and ability to perform repetitive tasks.
Matheson et al. proposed different types of such cell-less HRI, defined by their closeness of interaction (Matheson et al., 2019). They include coexistence (separation in space but not in time), synchronized (no separation in space but in time), cooperation (no separation in space or in time, but still not working on the same task), and collaboration (human and robot work on a task together, where the action of one has immediate consequences for the other). These works indicate that communication and interaction between robots and humans are critical to successful HRI. While research in human-aware navigation aims to make the robot smart enough to understand human behavior and react to it (Kruse et al., 2013), supporting humans in understanding robot behavior is equally important (Kragic et al., 2018). As the work by Matheson et al. highlights, humans and robots increasingly share the same physical space in HRI, which makes communicating robot motion intent a particularly relevant aspect for safe and effective collaboration and a prerequisite for explainable robotics (Matheson et al., 2019).
However, robot motion intent is a rather vague term and lacks a clear definition. Further, it is not consistently used by researchers in the field. Instead, similar underlying concepts have been investigated under terms such as situational awareness (Levillain et al., 2019), forthcoming operation (Matsumaru, 2007), or robot signaling system (Tang et al., 2019). Suzuki et al., as part of their extensive literature review covering the relationship between AR and robotics, emphasize the potential of AR-based visualizations for communicating movement trajectories or the internal state of the robot (Suzuki et al., 2022). However, as their literature review extends beyond intent communication, they do not further discuss or define different types of intent, nor do they provide a deeper understanding of intent properties.
Our work presents a systematic overview of the field and addresses the current issues by conducting a scoping review. Such a review or survey contribution helps to organize the published research of the field and enables reflection on previous findings after the field has reached a level of maturity (Wobbrock and Kientz, 2016). The goal of our review is to provide a clear understanding and definition of robot motion intent, its properties, and its relationships within HRI. Furthermore, our work provides a first discussion to relate our HRI findings to the growing domain of Automated Vehicles (AVs), so-called external Human-machine interfaces (eHMIs), which have identified similar research and design challenges (Bazilinskyy et al., 2019; Dey et al., 2020; Rouchitsas and Alm, 2019; Colley et al., 2021; Currano et al., 2021).
3. Method
Scoping reviews provide an overview of the extent, range, and nature of evolving research areas. They help to summarize research findings and identify research opportunities (von Elm et al., 2019; Arksey and O'Malley, 2005). Our approach is in line with previous work by Ghafurian et al. (Ghafurian et al., 2021), Muñoz et al. (Muñoz and Dautenhahn, 2021), and Wallkötter et al. (Wallkötter et al., 2021). We applied * Preferred Reporting Items for Systematic Reviews (PRISMA)* (Page et al., 2021) guidelines, focusing on the * Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR)* (Tricco et al., 2018).
For an overview of each step in our paper selection process, please refer to Figure 2. We will discuss specific details of the individual steps in the following subsections. (1) Based on an initial screening of relevant literature, potential search terms were identified to perform a systematic query using three primary databases in the field of HRI (ACM Digital Library, IEEE Xplore, and ScienceDirect; see Section 3.1). (2) A filtering step was applied based on an algorithmic analysis of the total corpus to identify the most relevant terms related to the topic (see Section 3.2). (3) The resulting set of 822 papers was manually screened in a two-step process, and eventually, additional sources were found through a cross-check of the references in selected papers (see Section 3.3). The final corpus consists of 77 papers.
3.1. Initial Query
We explored a variety of query terms and their combinations because, as discussed, the field currently lacks a coherent and established terminology. In addition, we found several terms to be used in ambiguous ways, in particular terms such as communication and motion. Therefore, we decided on a broad search in this first step to increase recall and reduce the risk of overlooking relevant literature. We aimed to encompass a variety of different robot technologies while still focusing on the concept of intent, even though the word may be used in a variety of circumstances. We searched the titles, abstracts, and keywords of the databases’ full-text collections with the following combined terms:222ScienceDirect does not support the wildcard “” but uses stemming and lemmatization techniques. In order to achieve search results based on wildcards “,” we modified the combined term to: (robot OR cobot OR drone) AND (intent OR intention OR intend OR intended).
[TABLE]
3.2. Algorithmic Filtering
Due to our initial search being quite broad, further filtering was required to identify relevant papers. The initial set allowed us to apply an algorithmic approach similar to that of previous research done by O’Mara-Eves et al. (O’Mara-Eves et al., 2015). Specifically, we applied the Term Frequency-Inverse Document Frequency (TF-IDF) (Salton and Buckley, 1988) method to identify frequently used terminology within our corpus. TF-IDF has been shown to be suitable for information retrieval in literature reviews (Surian et al., 2021; Lerner et al., 2019). First, we preprocessed the entries by a) combining each paper’s title, keywords, and abstract into one field, b) fixing encoding issues such as & (and), ° (degree), and — (emdash), and c) converting the strings to lowercase as well as removing punctuation, numbers, symbols, and standard English stop-words from the corpus and replacing tokens with their lemmatizations (Manning et al., 2008). For the creation of the TF-IDF-weighted document-term matrix, we calculated the Term Frequency (TF) for each term of our corpus, taking the static Inverse Document Frequency (IDF) into account, and computed the TF-IDF for each term over all documents. The resulting TF-IDF-weighted document-term matrix is shown in Table 1.
From the first 150 entries of the TF-IDF sorted list of tokens, three researchers independently qualified related terms to communication and motion – two terms we had decided to leave out of the initial broad query due to word ambiguity. During the following consensus process, we excluded related terms that were too general and ambiguous (e.g., “show” is frequently used in “Our results show[…],” “present” in “In this work we present[…],” “demonstrate” in “We demonstrate in our results[…],” or “perform” in “We performed a study[…]”). All identified terms were then used in the filtering step by applying the following logic to the title, keywords, or abstract of each paper in our corpus:
[TABLE]
For a paper to be accepted, a term from the cluster “communication” and another from “motion” (both OR operation) had to appear in the title, keywords, or abstract (AND operation). As a result, 822 papers remained in our corpus.
3.3. Manual Screening
The final phase of our paper selection process required manual screening, following an approach similar to that of Doherty and Doherty (Doherty and Doherty, 2018). The process involved abstract screening, full-text screening, and reference screening. During the screening of all abstracts, we identified 706 out of 822 papers as not fitting into the scope of this review. The full-text analysis of the remaining 116 papers reduced the set to 48 papers. In addition, we screened the references cited by the set of 116 papers that were assessed for full-text screening. We identified 29 further relevant references, which we then included. This led to a final set of 77 papers, which were examined in the following. During the abstract and full-text screening, we pre-excluded 36 papers in unfitting paper formats still in the corpus, such as proceedings front matter, workshop calls, survey papers, or semi-duplicates – when two papers essentially presented the same contribution, due to one being a work in progress and the other a full paper. We also excluded 305 papers that aimed to convey the human’s intent (to the robot) but not the robot’s intent (e.g., Kurylo and Wilson (Kurylo and Wilson, 2019)). Similarly, we removed another 210 papers where the research did not focus on the intention of robot motion (no robot intent). For example, 1:1 teleoperated devices (e.g., van Waveren et al. (van Waveren et al., 2019)), or work focusing on AVs and eHMIs. We excluded another 220 system design papers that focused on aspectus such as aesthetics, mathematical models of motion planning, or definitions (e.g., Girard et al. (Girard et al., 2015)). Eventually, we removed four papers where no approach or prototype was developed and reported (e.g., Thellman and Ziemke (Thellman and Ziemke, 2021)).
4. Intent Communication Model
Through our literature review, we aim to improve understanding of the communication of robot motion intent by analyzing previous research. To that end, each author analyzed our literature corpus (n=77) in a multi-step process. It was discovered that several papers presented, combined, or empirically compared multiple intents (on average, more than two per paper). Therefore, we first systematically extracted all individual intents, resulting in a total of 172 intents. By screening these intents, we identified the primary entities (robot, intent, and human) as well as a communication flow between these entities that parallels that of the HCI model from Schomaker (Schomaker, 1995). However, in contrast to the HCI model, we focus solely on the communication of intent from robot to human, as previous research has already covered the inverse (Jain and Argall, 2019). Furthermore, we identified a top-level entity, goal, which describes the motivation to communicate intent, as well as a low-level entity, context, which describes the situation in which the intent is communicated. Reflecting on all entities, we analyzed the intents by asking 1) why they were communicated (goal), 2) who communicated them (robot), 3) what they communicated (intent), 4) to whom they were communicated (human), and 5) in which circumstances they were communicated (context). Dimensions, categories, and properties emerged from the data through an open coding process of the extracted answers; specifically, we identified kind of robot, location, type of intent, information of intent, and role of human as our dimensions. The resulting intent communication model is shown in Figure 3. In the following, we present our findings for the three primary entities (robot, intent, and human), which we define and support by giving examples. We also discuss the context of communicating robot motion intent.
4.1. Human
In HRI, we can distinguish between different scenarios based on how involved a human is in the task performed by the robot. For the entity human, we utilize these levels of closeness between robot and human to define the different roles of human. Moreover, all four roles of human are illustrated in Figure 4.
4.1.1. Definition
The human has a crucial role during HRI, which strongly impacts which intents need to be communicated. From the analyzed intents of our corpus, we derived four different roles of human (collaborator, observer, coworker, and bystander). The roles are ordered by the degree of human collaboration and involvement with the robot, starting with the most involved (see Figure 4). These roles are also closely connected to the overarching goal of the HRI. Here, we found supporting collaboration, oversight, and coexistence to be of primary importance. In the following, we define the different roles, discuss their relationships to overarching goals, and support them with examples.
Collaborator
When in the role of a collaborator, a human works with a robot on a shared task in the same space and at the same time (Matheson et al., 2019). Thus, communication of robot motion intent in this context is for supporting collaboration. It aims to foster the coordination of robot and human actions regarding space and time to allow them to work together on a shared task (e.g., a human-robot assembly team in a manufacturing scenario (Andersen et al., 2016)). The action of one of the two (i.e., robot or human) has immediate consequences for the other. For example, consider the scenario of a robot handing an object to a human (Newbury et al., [n.d.]; Dragan et al., 2015). Here, the human has to precisely anticipate and coordinate with the time and place the object will be positioned to enable efficient handover. To that end, Dragan et al. propose a robotic arm that applies so-called legible motion, allowing the human to infer the goal of motion quickly and with certainty (Dragan et al., 2015). The role of a collaborator represents the closest degree of HRI, as they form a team in which both depend on each other. In our literature corpus, a collaborator is described in 18 papers and is the recipient of 37 different intents.
Observer
A human functions as an observer when their main job is to supervise the task that is being carried out by the robot. Although they mostly just watch, an observer must be ready to intervene and take control of the robot. In this context, communication of robot motion intent is for the goal of supporting oversight. Here, the robot has to provide information to the human to allow effective intervention when needed. Fundamentally, supporting oversight refers to the ability of a human to judge and evaluate if a robot is operating within its intended parameters. For example, in work by Hetherington et al., the robot communicates its movement paths to an observer, which enables the observer to foresee and prevent potential collisions of the robot with obstacles (Hetherington et al., 2021). Others communicate the inner state of the robot, allowing an observer to anticipate potential task failures that may occur due to problems with the robot itself, e.g., faulty sensor information (Baraka et al., 2016; Han et al., 2020). An observer is described in 47 papers and is the recipient of 94 intents.
Coworker
In the coworker role, the human works next to the robot but handles their own task. While these tasks may be part of a shared overarching effort or entirely disconnected, they take place in the same shared workspace (e.g., a robotic arm that picks up one out of two objects and leaves the other one for the human (Lee Chang et al., 2018)). In the coworker context, communication of robot motion intent is for the purpose of supporting coexistence. Here, the human needs to understand the robot’s motion to avoid safety-critical situations (e.g., colliding with the robot). In Aubert et al., a robot and human pick up objects from a shared bin for their individual tasks (Aubert et al., 2018). Here, communication of robot motion intent can help the human to coordinate their actions and avoid collisions with the robot. Chadalavada et al. showed that communication of motion intent through Spatial Augmented Reality (SAR) can improve perceived safety with mobile robots (Chadalavada et al., 2020). In their study, it meant that participants could choose safer walking paths and get closer to the robot without subsequent safety shutdowns. In our literature corpus, a coworker is described in six papers and is the recipient of 18 intents.
Bystander
The human is a bystander when they do not share the same task or the same task goal with the robot but still occupy an area overlapping the robot’s physical workspace. Like the coworker role, the bystander role involves communication of robot motion intent to support the goal of supporting coexistence. A bystander needs motion information to avoid collision and feel safe. For example, imagine a human and a robot encountering each other in a corridor. To allow the human to choose a walking path that avoids collision, the robot can move to one side and communicate its intended movement path in advance (Mikawa et al., 2018; Watanabe et al., 2015). A bystander is described in 17 papers and is the recipient of 23 intents.
4.2. Intent
We identified four different types of intent that the robot can communicate to the human to express its intentions, contributing to increased transparency. We consider these types to be the main dimension for classifying intent in the following text. In addition, we identified the dimensions location and information, as shown in Figure 3, which help to further classify and describe intent. Given their great importance, they are discussed separately in Section 5.
4.2.1. Definition
As our literature review focused on communicating robot motion intent, a majority of the corpus (69% of all papers; 54% of all unique intents) deals with motion intent. Nevertheless, we identified additional intent types that are related to motion intent and of equal importance (i.e., attention, state, and instruction). All types of intent are described below and the relationship of each to motion is explained. Furthermore, we found that for each type of intent, we can further distinguish between an intent that is related to the robot and one that is related to the world (more details can be found in the individual paragraphs below). An overview of all types of intent and associated papers can be found in Table 2.
Motion
These intents are the main type of intent. Motion intent is concerned with explicitly communicating future motions (i.e., actions that the robot will perform). As our survey is focused on robot motion intent, it encompasses more than 50% of the identified unique intents in our corpus. Most of the described intents deal with robot self-actions, aiming to indicate future robot movement. Thereby, users may be able to improve the coordination of their actions in concert with the robot’s behavior to avoid collisions and improve safety. For example, Chadalavada et al. employed SAR to communicate future movement direction as well as the specific path the robot will take, which helped bystanders feel safe around a robotic forklift (Chadalavada et al., 2020). World actions are activities that manipulate the world around the robot. Again, this may help the bystander to coordinate their activities, but it also helps the observer to understand when to take over control from the robot. Psarakis et al. applied this concept of world actions in a VR simulation to visually augment the nearby objects that the robot planned to grasp (Psarakis et al., 2022).
Attention
Intents that communicate the need for attention are a supportive element. They precede a motion intent to shift human attention toward the robot or process, especially when the humans’ attention is not guaranteed (e.g., because they focus on their own tasks). For example, Bolano et al. used acoustic feedback to alert the human and shift their attention toward the robot whenever it detected a possible collision (Bolano et al., 2018). An example of robot-focused attention was presented by Furuhashi et al., who designed an assistive robot based on the commercial Roomba device as a hearing dog that can notify deaf users of important events (Furuhashi et al., 2015). Here, the system uses physical touch to gain the human’s attention by gently bumping into their body. As an example of world-focused attention, Mutlu et al. had a humanoid robot quickly look at an object of interest. They studied whether collaborators were able to understand the robot’s gaze cues and correctly identify the object (among several others) that the robot had chosen as its object of interest (Mutlu et al., 2009)).
State
A robot communicating its state allows a human to deduce potential future motions and identify conflicts before they occur. For example, a robot could collide with nearby objects due to errors in its sensor system. However, robot communication of the detected objects enables a human to take over control and mitigate the issue. For state intents, we distinguish between robot self-perception, meaning the state the robot communicates about itself (e.g., simple text feedback presented on a display that indicates states such as “stop” or “moving” (Matsumaru, 2007)), and robot world perception, meaning the communication of the perceived state of the world (e.g., visually highlighting objects in the environment that the sensor system has successfully detected, allowing the user to predict and understand subsequent robot movement (Han et al., 2020)).
Instruction
In several papers, we identified instruction intents that accompany robot motion. For example, if a robot is blocked by an obstacle, it can instruct a human to remove the obstacle so it can continue its motion. Instructions can be robot-centered instructions when they stand in relation to the robot itself (e.g., Moon et al. applied head gaze cues to communicate instructions to the user to complete the handover of an object from the robot’s gripper (Moon et al., 2014)). Or, in contrast, instructions can be world-centered instructions when they stand in relation to the world (e.g., a robot instructing a human to push a button on a wall to open an elevator so that it can continue its movement (Wengefeld et al., 2020)).
4.2.2. Relationship to Human
Communicating a robot’s intended motion to the human helps to improve the perception and understanding of the robot’s behavior. However, humans that are, for example, not involved in the robots’ task – perhaps because they are focusing on their own tasks (coworker) or are just uninvolved in general (bystander) – often need an additional cue to be able to read robot motion intent, which makes the intent type attention necessary (e.g., by an acoustic prompt (Aubert et al., 2018)). State intents enable a human to see not only the next motion but also the internal state and planning, enabling them to understand actions ahead of time. Such intents also support observers in their task of supervising the robot. Finally, collaboration means a constant shifting of who is in charge when humans and robots work together on a shared task. Therefore, motion, state, attention, and instructions are all necessary intents for providing a baseline for collaboration (collaborator).
4.3. Robot
In our corpus, we identified three different kinds of robot, which together form the robot entity.
4.3.1. Definition
We identified three main kinds of robots: robotic arm, humanoid, and mobile robot. These, in order, represent a spectrum of increasing mobility and flexibility based on the area of deployment, starting with stationary robots (still with many DoF) and ending with robots that are inherently mobile (which also includes mobile arms with many DoF on a platform). Based on different robots, researchers have investigated different intents with varying frequencies. In the following, we illustrate each kind of robot with examples from our literature corpus.
Robotic Arm
Robotic arms can be described as a chain of axis links. They are typically fixed to one place and can have a manipulator (Gautam et al., 2017). Nowadays, they are the industry standard in production lines of factories (Bragança et al., 2019) and work alongside humans in HRI environments (Domonkos et al., 2020). Robotic arms are described in 13 papers and send 22 intents.
Humanoid
Humanoids have two robotic arms with manipulators, a torso, a head, eyes, and, often, basic facial expressions. Due to the two robotic arms, humanoids have more DoF than single robotic arms. Still, humanoids are often fixed to one place and lack mobility. Nonetheless, they are an important part of HRI when working with humans in a shared workspace (LeMasurier et al., 2021; Rosen et al., 2019). In rare cases, they can move in space, imitating human movement. Here, anthropomorphic features of the robots – such as gaze or certain gestures – can decrease the time required to predict the robot’s intent (Gielniak and Thomaz, 2011). Humanoids are described in 11 papers and send 21 intents.
Mobile Robot
With the addition of mobility comes increased flexibility. Mobile robots can be deployed in the air, on the ground, or in water. For this kind of robot, we have actively chosen to define them more broadly to include robots that appear only once in the corpus. For mobile robots (also referred to as drones), we distinguish between ground drones without a manipulator that move between locations, ground drones with a manipulator that can also manipulate the world, flying drones that maneuver through the air, and water drones that operate on water or underwater. Communicating motion intent helps ground drones without a manipulator to, for example, lead or follow a human to a specific place (Glas et al., 2007). It can help ground drones with a manipulator to, for example, communicate which object they intend to pick up (Chakraborti et al., 2018). Flying drones or water drones, on the other hand, can communicate their motion intent by flying or driving in a pattern (Szafir et al., 2014; Novitzky et al., 2012). All kinds of drones can appear alone (Cleaver et al., 2021) or as a swarm of drones (Capelli et al., 2019). Mobile robots are described in 53 papers and send 129 intents.
4.3.2. Relationship to Intent
As mobile robots move around more freely, they frequently encounter human bystanders who cross their paths. Consequently, mobile robots often have to first shift the human’s attention toward the robot’s display, preparing them for the communication of the robot’s intended motion. For example, a projection in front of the robot can catch the attention of a bystander while simultaneously informing about the direction of driving (Matsumaru, 2006). At the same time, mobile robots need to communicate their state and planning of actions ahead of time, either the inner state (e.g., what is the current mission status (Levillain et al., 2019)) or the perceived world state (e.g., which objects are detected (Correa et al., 2010)). Humanoids and robotic arms, on the other hand, are often deployed in collaborative scenarios, teaming up with humans. Here, robots need to communicate their intended motion to coordinate their actions with a human collaborator (e.g., which items the robot intents to pick next from a shared bin (Aubert et al., 2018) or when objects are to be handed over to the collaborator (Newbury et al., [n.d.])).
4.4. Context
The context describes the setting of the HRI scenario. While the location is an essential part of the context, there is more: for example, the social environment (Schmidt et al., 1999). Nonetheless, we consider the location helpful to define HRI scenarios. In our analysis, we found various types of locations, including workplace, domestic, and outdoor. In workplace settings, the robot is frequently part of an assembly line or, more generically, a manufacturing process (e.g., collaborating with a human worker (Tang et al., 2019)). However, workplace locations also include industrial settings, offices, or generic work rooms. In total, 42 papers took place at a workplace location. In domestic environments, robots support a task at home (e.g., by picking cups up off a kitchen table (Dragan et al., 2015)). Here, we found five relevant papers. Finally, in two papers the robot could move freely outside (e.g., fulfilling a mission and communicating its status (Duncan et al., 2018)). Apart from these, 28 papers had no particular location specified. Instead, the authors of these papers investigate more generic scenarios of robot motion intent (e.g., by stating that a robot moves between two locations but without fine details of these locations (Matsumaru, 2007)). For these scenarios, it is unclear which locations are most relevant.
5. Analysis of Intent Information and Location
In addition to the different types of intent discussed in the previous section, two other dimensions of intent emerged from the data: Intent information (which refers to the data communicated by the robot) and intent location (which describes from where the intent is communicated to the human). In this section, we define these dimensions, illustrate their application with examples, and present a summary of empirical findings concerning their usage.
5.1. Intent Information
Based on our analysis of how the intent is communicated as well as what is communicated, we derived two main properties for categorizing intent information: spatial and temporal.
5.1.1. Spatial Property
The primary approach to convey spatial information is to embed it directly into the environment, i.e., have it registered in space. We identified 105 matching intents. We can further classify such intents as conveying local information (74 intents) or directional information (31 intents). Local information aims to precisely relate the information to the surrounding space by showing an exact position that naturally may contain orientation information as well. Han et al., as an example, convey local information by using SAR polygon visualizations to frame and highlight detected objects on a table, allowing a human observer to supervise the robot’s intended movement and manipulation actions (Han et al., 2020). In contrast, directional information aims to communicate the explicit direction of movement (e.g., an arrow pointing in the direction of movement (Chadalavada et al., 2020) or toward an object or person of interest (Holladay et al., 2014)).
Information that is unregistered in space, however, employs an abstract encoding of the spatial property. In total, we identified 67 matching intents. This category includes the following types of intent: Description, symbol, and signal. Description (11 intents) applies to scenarios in which textual or verbal information is used (e.g., the robot informs the human verbally before initiating a movement to perform a touch (Chen et al., 2011)). Symbol (25 intents) applies to cases in which a symbolic representation is used to form the intent communication (e.g., a mobile robot that nods its head to request a human follow before moving toward its destination (Faria et al., 2016)). Signal (31 intents) applies when components are turned on/off to indicate a change (e.g., an acoustic prompt is turned on to gain attention for the upcoming communication of motion intent (Aubert et al., 2018)). Mini maps provide an abstract but geographical encoding that includes the relationships among different objects in the environment (Chandan et al., [n.d.]; Walker et al., 2018; Zolotas and Demiris, 2019).
Empirical Implications. While information registered in space provides a direct link between real-world objects and the displayed information, information unregistered in space lacks this connection and requires an additional mental step to create this link. Consequently, information unregistered in space may be less intuitive, and thus researchers have explored different combinations of information to mitigate that. Andersen et al. as well as Wengefeld et al. showed that combining multiple types of intent information that are unregistered in space (e.g., text description and symbol icons) helps to effectively communicate motion intent to the user (Andersen et al., 2016; Wengefeld et al., 2020). On the other hand, Staudte and Crocker found that combining both categories (registered & unregistered), which in their case involved a robot gazing at a specific object while a verbal description of the object played, leads to successful perception and understanding by the user (Staudte and Crocker, 2009). Similarly, Bolano et al. later showed that a verbal description of the target can be combined with visual feedback of the motion endpoint to achieve the same improvement (Bolano et al., 2018).
5.1.2. Temporal Property
The temporal property of intent information is about the distinction between having a discrete or continuous information flow. Discrete information has a fixed, distinct appearance in time and is beneficial for communicating robot motion intent because it enables the human to detect a change (i.e., the information appears) and it signals at which point the information loses its relevance (i.e., it disappears). For example, Aubert et al. equip their humanoid robot with a display that shows the number of the next bin it will approach, thereby allowing a human to avoid conflict with the robot (Aubert et al., 2018). Overall, we identified 89 intents that communicate discrete information. Continuous information, as has been provided in 83 intents, is available throughout the whole task or over several task phases (i.e., it is visible independent of its relevance to the current task). It enables the human to observe the robot, compare it with the world, and evaluate the correct task execution. Tsamis et al., for example, implemented AR visualizations for a Head-Mounted Display (HMD) to continuously communicate the intended movement space of a robotic arm by placing a semitransparent red sphere around the robotic arm (Tsamis et al., 2021).
Empirical Implications. Faria et al. showed that both discrete and continuous information are effective for communicating a follow me intent with spherical robots (Faria et al., 2016). Koay et al. also evaluated both temporal properties using a robot dog that guides people living with hearing loss. However, they found that a motion-based approach (continuous), in which the robot’s head movements request users to follow, is more successful than using a flashing Light-Emitting Diode (LED) stripe (discrete). They attribute this to the fact that head movements are more straightforward to interpret (Koay et al., 2013). The findings of Aubert et al. suggest that combining discrete and continuous information is the most effective method. They showed that the combination of a motion-based approach (continuous) and a display approach (discrete) to communicate the robot motion end-point outperformed both uni-modal intent communication conditions (Aubert et al., 2018).
5.1.3. Cross Relations
Inherently, the information of every intent has spatial and temporal properties. In the following, we describe the relationships between these properties of intent information.
For unregistered in space, the temporal property is almost evenly distributed between discrete and continuous information. Here, signal is an exception, as discrete (23 intents; e.g., having flashing lights attached to a mobile robot to indicate a discrete change of movement direction, similar to a car (Hetherington et al., 2021)) is used more often than continuous (eight intents; e.g., an LED stripe attached to the robot to continuously communicate the remaining distance to the target position through a color-coded progress bar (Baraka et al., 2016)). Signals are primarily used to communicate sudden changes. Accordingly, such discrete events are naturally communicated as discrete intent information.
For registered in space, we see an uneven distribution for both subcategories. Intent information classified as local is mostly communicated as continuous information (50 intents; e.g., using SAR to continuously highlight an area in a workplace where the robot will be active during its movements and action (Andersen et al., 2016)) instead of discrete (24 intents; e.g., using SAR to highlight a button on a wall that must be pushed by a human for the robot to continue its movement (Wengefeld et al., 2020)). We think that robot motion likely relates to a continuous event because it is meant to happen over time and takes place continuously. Intent information classified as directional is mostly communicated as discrete information (23 intents; e.g., a display is attached to the top of a mobile robot, communicating the intended movement direction with an arrow (Matsumaru, 2007)) and only seldom as continuous (8 intents; e.g., a drone is visualized as an eye in AR, constantly looking in the direction of movement (Walker et al., 2018)). The reason is that directions are primarily used to communicate an updated movement direction to the human; therefore, it makes sense that they are most often given as discrete information.
5.2. Intent Location
Various technologies can enable the communication of robot motion intent. We found that, in particular, the placement of these technologies (on-robot, on-world, and on-human) can help to classify the different approaches in the literature, as there is often a relationship between the placement and specific types of technology.
On-Robot can be further divided into robot-only technology or additional robot-attached devices. We identified 114 intents communicated through on-robot technology. As an example for the subcategory robot-only, Moon et al. utilize the head orientation of the robot, mimicking a gaze cue, to communicate mid-air locations for its intended movement as an instruction to the user (Moon et al., 2014). Nearly half of all categorized intents that utilize on-robot technology fall into that subcategory, which is of particular interest because it limits the need for additional technology and often involves imitation of human-to-human behavior. The robot-attached subcategory requires some additional hardware to be mounted to the robot (e.g., SAR, LED, or displays). For example, Wengefeld et al. attach a laser projection system to the robot and thereby communicate various types of intents, including state, motion, and instruction (Wengefeld et al., 2020).
On-World has received relatively little attention in the literature. It includes, for example, small displays attached to the workspace at object bins (Aubert et al., 2018), or a desktop display (to visualize motion intent) with speakers (to gain attention) next to the robot’s workspace (Bolano et al., 2018). While the inability to change the environment may be less desirable from a generalizability perspective, for some technology, it adds significant benefits. In particular, SAR would be easier to realize with a fixed projector position on-world and it would allow for larger projection areas. We identified eight different intents on-world.
On-Human includes head-attached technologies, which primarily refers to HMD devices, which allow more complex visualizations. Gruenefeld et al., for example, experimented with different spatial visualizations, such as visualizing the intended movement path, previewing future locations of the robot arm, or visualizing the activity area as a whole (Gruenefeld et al., 2020). In addition, some approaches rely on hand-held technologies. Correa et al., for example, used a tablet device displaying various types of information (map, live view, next steps) to support oversight and communicate motion intent (Correa et al., 2010). We identified 50 intents on-human.
Empirical Implications. For the intent location, it is generally better to output information closer to the target. For example, LeMasurier et al. compared several motion-based and light-based approaches for humanoids to communicate an intended start of movement at an assembly workplace. They saw that an LED bracelet located closest to the workspace was the most noticeable and least confusing (LeMasurier et al., 2021). Furthermore, researchers found evidence that humans may prioritize on-human technology over on-robot technology. For example, Che et al. were able to show that the use of a vibrotactile bracelet worn by the user led to a better expression of the robot’s motion intent, reduced users’ effort, and increased users’ trust in the robot during a collision-avoidance movement when compared to a solely robot-based approach using legible motion (Che et al., 2020). Finally, combining multiple output technologies can further increase performance. For example, Mullen et al. investigated a multi-modal approach for communicating robot interference in a sorting scenario that combined an AR-HMD visualization and active feedback via a vibrotactile bracelet. They found that combining both feedback types outperformed the single modality baselines. It allowed the human to more efficiently teach the robot and decreased the required interaction time (Mullen et al., 2021).
5.3. Relation between Location and Information
In the following, we provide insights into the relationship between intent location and intent information (cf. Table 3).
5.3.1. Registered in Space
To communicate location information registered in space, most researchers rely on head-attached technologies, such as AR-HMDs (on-human). For example, Tsamis et al. implemented AR visualizations to communicate an intended movement trajectory of a robotic arm (Tsamis et al., 2021). They placed small spheres along a defined path in 3D space from the robot’s end-manipulator to a specific destination. They found that using their system improved task completion and robot idle times, with fewer interruptions to the overall workflow. In addition, users reported increased feelings of safety and trust toward the robot. In contrast, Correa et al. proposed a tablet visualization that showed a live camera feed of the mobile robot highlighting recognized objects in its environment via a wireframe in the visualization (Correa et al., 2010). In addition to intents displayed on-human, robots are often used to convey information directly through specific movements or pointing (on-robot). For example, Holladay et al. used a robotic arm and its end-effector to communicate a directional cue by pointing toward an object placed on a table (Holladay et al., 2014). The resulting pointing configurations were reported to make it easier for novice users to infer the target object. Another example for displaying information on-robot is provided by Hetherington et al. They used SAR to project an arrow in the intended movement direction of the mobile robot on the floor (Hetherington et al., 2021). Their results show that projected arrows were more socially acceptable and more understandable than flashing lights. Finally, information registered in space can be outputted on-world. For example, Cleaver et al. used their web-based environment (Cleaver et al., 2020) to compare four different conditions of visualizing the intended movement trajectory of a mobile robot on a world-located display (Cleaver et al., 2021). In contrast, Aubert et al. placed small displays on three bins and used bin numbers and progress bars to indicate from which bin the robot coworker would next withdraw an item. However, the display-based approach could not significantly reduce the number of physical conflicts (Aubert et al., 2018).
5.3.2. Unregistered in Space.
Interestingly, a relatively large number of symbol information is communicated through the robot itself (on-robot). Here, we found many approaches where the robot performs specific movement patterns that the human has to decode appropriately. A symbolic approach is shown by LeMasurier et al. (LeMasurier et al., 2021). They slightly move the robot’s manipulator to the left and right to communicate an intended movement start. This approach received relatively high ratings on several measures; however, the authors recommend that the addition of light signals near the workspace and the origin of motion (like an LED bracelet) may provide a benefit to HRI in shared spaces. Song and Yamada provide an example of the type symbol by using different static and dynamic light patterns on a robot-attached colored LED stripe to illustrate different states of the robot (Song and Yamada, 2018a). Communication of signal information is mainly achieved through robot-attached technology, such as LED or audio speakers. Wearable technologies can also show spatially unregistered information (on-human). Che et al. propose a vibrotactile bracelet worn by the user to communicate an initiated collision-avoidance movement of a mobile robot (Che et al., 2020). This approach led to a better expression of the robot’s motion intent, reduced users’ effort, and increased users’ trust in the robot. Furthermore, Walker et al. implemented a radar-like mini-map in the corner of an AR visualization to illustrate the relative position of the user to a drone (Walker et al., 2018). Although the radar provides the user with the means to rapidly locate the robot relative to their own position, some participants mentioned that they did not need to use the radar much because they always faced the drone. Finally, unregistered information can also be presented on-world. Bolano et al. propose verbally describing the updated destination of the robot’s end-manipulator via a speaker in addition to the screens placed in the shared workspace (Bolano et al., 2018). They found that users better understood the robot’s intended motion, including when the robot had to reroute itself to avoid collision.
5.3.3. Discrete.
Discrete information is usually presented directly on-robot. As an example of robot-attached technology, Domonkos et al. attached a colored LED stripe to the base of a robotic arm to communicate the intended direction of movement to a human coworker (Domonkos et al., 2020). In contrast, Glas et al. proposed a mobile robot that performs head gestures to initiate either a follow-me or lead-me request to the human (Glas et al., 2007), relying on the robot itself as in robot-only. Gu et al. evaluated a visual feedback displayed through an AR-HMD (on-human), indicating the planned movement direction of the robot via an arrow visualization (Gu et al., 2021). They found that the visualization improved perceived safety and task efficiency. Instead of relying on the visual modality, Mullen et al. proposed discrete feedback through a vibrotactile bracelet that is activated to communicate robot interference, triggering the human to move in order to allow the robot to continue its movement (Mullen et al., 2021). Their findings show that vibrational feedback can reduce the time required to notice and respond to an intent. Aubert et al. equipped bins (from which items could be chosen) in the environment with speakers to emit discrete auditory information on world (Aubert et al., 2018). They recommend not solely relying on auditory information, but using it in a multi-modal approach, which is further supported by Bolano et al. (Bolano et al., 2018).
5.3.4. Continuous.
Like discrete information, continuous information is primarily displayed on-robot. Matsumaru et al. attached an omnidirectional display on-robot, projecting an eyeball-like visualization that effectively communicates the direction of movement to a human (Matsumaru et al., 2005). In contrast, Dragan et al. propose performing legible motions with a robotic arm itself to communicate the next object it will grasp (Dragan et al., 2015), which they found enabled fluent collaboration. As an example of communicating intents on-human, Walker et al. display a symbolic representation of a focusing eye lens in an AR-HMD, encoding the relative distance to the next target (Walker et al., 2018). Their results show a significant improvement in users’ understanding of robot motion intent. Watanabe et al. proposed presenting continuous visual feedback via a tablet to inform a wheelchair passenger of a robot’s intended motion path (Watanabe et al., 2015). Lastly, continuous information can be displayed on-world. Chandan et al. proposed a map visualization for a stationary tablet display that continuously shows the locations of three mobile robots and other objects of interest (Chandan et al., [n.d.]). They found this approach significantly improved the participants’ ability to observe and assist the robot. Similarly, albeit only studied in a web-based experiment, Cleaver et al. proposed a 3D visualization displayed on a 2D screen to continuously communicate the intended path of a mobile robot (Cleaver et al., 2021).
6. Discussion and Future Research
In the following, we discuss key findings of our literature survey and formulate future research directions as takeaway messages for the HCI community. The organization of the section follows the three entities human, intent, and robot from our intent communication model and concludes with a discussion of the overall model.
Human
From the analyzed intents of our corpus, we derived four different roles of human (collaborator, observer, coworker, and bystander). In our analysis, we found that the human role is strongly related to the overarching goals of communicating motion intent – a specific goal can be directly derived given a specific human role. For example, if the HRI scenario involves the human taking the role of an observer, the motion intent needs to help with fostering oversight. As a result, this indicates that practitioners and researchers should explicitly define the role and, thereby, the involved human stakeholders before settling on the robot or specific intents they may want to communicate. The human roles we found in a bottom-up process through our analysis align well with the previous work of Onnasch and Roesler (Onnasch and Roesler, 2020). In contrast to Onnasch and Roesler, the role of the operator did not show up in our analysis. We suggest this is because robots are not manually operated by humans in our corpus, as this would not require the robot to communicate any intent (Grudin, 2017).
Future Research: Our analysis showed that nearly all papers a) investigate individual human roles, e.g., they (often implicitly) pick one and focus on that, and b) design and study only for a 1:1 relationship between human and robot. The only exceptions to this are Faria et al., Kirchner et al., and Palinko et al., who investigate the legibility of robot movement for a group of humans (Faria et al., 2017) or explore the use of gaze cues to allow the robot to choose their human collaboration partner from a group of humans (Kirchner et al., 2011; Palinko et al., 2020). This limited involvement of multi-user groups is, of course, to be expected in an emerging field that first needs to establish certain ground truths. Involving multiple persons or even multiple robots and persons complicates HRI tremendously, yet we think this is the subsequent step research must take. In particular, it would be interesting to reflect on the suitability of specific technologies (e.g., SAR will likely be better suited to satisfy multi-user scenarios compared to HMD technology).
Intent Types
Through our scoping review of robot motion intent, we observed that communication of motion often requires additional intents that serve as pre- or post-cursors to the communicated motion intent. Furthermore, we found that robot motion can also be indirectly communicated: For example, by communicating only the robot’s state (e.g., (Baraka et al., 2016)) or by instructing a human to open a door so the robot can continue on its path (e.g., (Watanabe et al., 2015)). These various types of intent demonstrate the different facets of robot motion intent, which represent both actual intended movement trajectories and related communication. We see that as a key finding, distinguishing our work from previous research that focuses primarily on the communication of motion intent (Rosen et al., 2019; Suzuki et al., 2022; Walker et al., 2018). With our survey, we are confident that other researchers will start to adopt a more holistic and precise use of the term robot motion intent and, for example, start highlighting the need for related intents, as we found in our analysis.
Future Research: Researchers should investigate how the different types of intent may best be combined to achieve specific intent communication goals. Currently, there is little empirical knowledge about, for example, when and to what extent a robot may need to first communicate attention before effectively being able to communicate motion intent. Further research should also challenge our classification of types of intent and potentially extend them.
Intent Information and Location
We derived two main properties that categorize our identified intent information related to space: registered in space (61.05%) and unregistered in space (38.95%). This almost-even distribution reveals that a lot of relevant research not only focuses on information that aims to convey local or directional information (e.g., a resulting trajectory (Cleaver et al., 2021)), but also on more abstract representations, namely description, symbol, and signal. These are often much less complex and indicate that robot motion intent can be communicated without visual 3D representations of future movement. This shows that there are viable alternatives to wearing special on-body technology, resulting in fewer system costs and a decreased setup time. An alternative can be the intent location on-robot. In previous work, researchers have refined robots with anthropomorphic elements – such as eye-like features or certain movement gestures – to communicate motion intent. Our literature review identified 15 such instances, specifically applying eye- or head-gaze (e.g., looking at an object to indicate a handover between human and robot (Moon et al., 2014)). While anthropomorphic elements may not be as precise as digital representations through technology means (e.g., visualizations in AR), they share the same baselines as in Human-Human Collaboration (HHC). The general assumption is that, in turn, they can be easily understood by users and can mostly be integrated into the actual HRI. A possible combination with a verbal description provides a multi-modal output to the user, resulting in faster recognition of the specific object (Staudte and Crocker, 2009).
Future Research: While previous research has explored combinations of spatially registered and unregistered information (Staudte and Crocker, 2009), we are unaware of research that has contrasted their effectiveness. Therefore, current design decisions may be based more on the availability of particular technology and less on the intended outcome. Future research should explore this further so that practitioners can more accurately judge the potential trade-offs between simple or complex information and related technology use. Regarding the use of anthropomorphic features, the integration of such communication cues has been explored regarding their legibility and effectiveness in communicating robot motion intent. However, their implicit consequences (e.g., causing the human to ascribe human-like behavior to the robot) may still need to be fully explored. The means and cues of communication have significant consequences for the trust relationship between humans and robots (Hamacher et al., 2016).
Robot
When looking at the three kinds of robots and their usage in research, we can see that the physical properties of a robot have a large impact on communication means: In particular, the on-robot location for intent communication. Some robots come with pre-installed displays, while others have anthropomorphic features built in. Flying drones, on the contrary, require some kind of remote communication tool (often in the form of HMDs) to communicate over a larger distance. Robots are also an area of much technical experimentation, i.e., many researchers are building or customizing their own robots. For example, one may add anthropomorphic features to a robotic arm. As a result, researchers tend to use these built-in or customized features to communicate intent. They may often have only a particular kind of robot available; thus, they are limited to a certain way of communicating robot motion intent. Of course, this limits the generalizability of current findings, as each robot conveys unique features that can impact HRI.
Future Research: These findings show that many research endeavors explore only certain kinds of robots. A more systematic approach is called for to investigate the various kinds of robots and their impacts on communicating robot motion intent. We also found that more and more research applies simulation environments in Virtual Reality (VR) to explore HRI. Nevertheless, we need more studies to validate such findings and provide a broader foundation for their generalizability.
Context
Compared with previous research in AVs (Colley et al., 2021; Currano et al., 2021) and eHMIs (Dey et al., 2020), we can identify several similarities, despite the substantial differences in the context of use and robot technology. Colley et al. found that visualizing internal information processed by an Augmented Virtuality (AV) could calibrate trust by enabling the perception of the vehicle’s detection capabilities (and its failures) while only inducing a low cognitive load (Colley et al., 2021). Currano et al. explored the interaction between complexity of head up displays, driving style, and situation awareness (Currano et al., 2021). In the area of eHMIs, researchers have been able to distinguish between different natures of message (e.g., danger and safety zones) (Dey et al., 2020). These correspond to our identified types of intent, highlighting different meanings for the user for the provided intent. In the context of AVs, the information used to formulate the actual intent is primarily unregistered in space. It uses text, symbols, and audio prompts. The intent primarily describes the vehicle’s state (e.g., automated/manual, cruising, yielding) or advice/instructions to the pedestrian (e.g., to allow safe road crossing). The large differences between the fields of research result primarily from the standardizations in automotive research, such as roads, road signs, markings, and restrictions. Nevertheless, there are potential overlaps.
Future Research: The two fields have, from our perspective, not yet shared many cross-activities among researchers, which could lead, for example, to transferring those motion intent techniques that have shown to be effective in one field to the other. We could imagine that future research could benefit both sides if a more holistic perspective is applied. In particular, the research for eHMIs in AVs could benefit from more exploratory technological approaches in HRI, such as making use of AR-HMDs and applying more advanced visualization to communicate motion intent. While this may not be relevant for the near future, as such devices are not yet consumer-ready, this may change over the coming years.
The Model
The overall model is an abstract characterization of the current literature on robot motion intent. It may be seen as a summary of the current understanding of the design space for robot intent communication, where it illustrates all components and highlights their interconnection. Thereby, future researchers and practitioners should benefit from the model by using it as a guidance and checklist throughout the design phase of such Human-Robot scenarios; i.e., being guided to carefully think and decide upon different types of intents or whether intent information should be encoded spatially or temporally. In addition, the model can help to unify the language of robot motion intent and thereby support researchers and practitioners to find related work as well as help to identify research gaps.
Future Research: We invite researchers to actively challenge the model and thereby helping to develop the field even further. They should scrutinize whether the design space is sufficiently classified or how it can and needs to be extended to cover future work. As our model was derived from the analysis of our literature corpus, it is fitted to the gathered research. Nonetheless, one can utilize novel research contributions that will be published in the future to revisit and evaluate the model (i.e., to investigate if novel contributions can still be described by our model). Moreover, we imagine that a more thorough discussion in the context of eHMIs may benefit the model as well as incorporating other lines of research that are concerned with communicating intent, such as Sodhi et al. or Müller et al. (Sodhi et al., 2012; Müller et al., 2020).
7. Conclusion
This paper provides two main contributions: 1) a survey contribution that includes an analysis and classification of previous literature as well as future research directions, and 2) a theoretical contribution that introduces an intent communication model and describes the relationships of its entities, dimensions, and underlying properties. In particular, our work highlights that robot motion intent requires a broader perspective on robot intent and that it includes intent types that may seem, at first glance, unrelated to motion. However, in our analysis, we found that attention, state, and instruction are important and often necessary pre- or post-cursors to communicate explicit motion intent. We also found that only a few papers explicitly discuss or present the type of intent they aim to communicate and they also lack clear descriptions of intent information or location. Our work aims to help researchers in the future to better align their work with the suggested dimensions, making it easier to assess and compare different studies. Therefore, we aim to provide a foundation for a unified language regarding robot intent, even beyond motion. From a practical perspective, the classification of the existing research literature along our intent communication model helps researchers and practitioners alike to understand the design space for communicating robot motion intent. As it is an emerging field, much work has focused on finding novel approaches and solutions to communicate robot motion intent in one way or another. We have identified multiple areas of need for future research directions. However, we would like to emphasize once more that, above all, the field needs more systematic analysis and comparison of different approaches to improve understanding of the influences of different intent dimensions and properties. We believe that the presented intent communication model provides an empirically deducted foundation to inspire and guide such work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Ajoudani et al . (2017) Arash Ajoudani, Andrea Maria Zanchettin, Serena Ivaldi, Alin Albu-Schäffer, Kazuhiro Kosuge, and Oussama Khatib. 2017. Progress and prospects of the human–robot collaboration. Autonomous Robots 42, 5 (Oct. 2017), 957–975. https://doi.org/10.1007/s 10514-017-9677-2 · doi ↗
- 3Amershi et al . (2019) Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19) . Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300233 · doi ↗
- 4Andersen et al . (2016) • Rasmus S. Andersen, Ole Madsen, Thomas B. Moeslund, and Heni Ben Amor. 2016. Projecting robot intentions into human environments. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) . IEEE, 294–301. https://doi.org/10.1109/ROMAN.2016.7745145 · doi ↗
- 5Arevalo Arboleda et al . (2021) Stephanie Arevalo Arboleda, Franziska Rücker, Tim Dierks, and Jens Gerken. 2021. Assisting Manipulation and Grasping in Robot Teleoperation with Augmented Reality Visual Cues. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21) . Association for Computing Machinery, New York, NY, USA, Article 728, 14 pages. https://doi.org/10.1145/3411764.3445398 · doi ↗
- 6Arksey and O'Malley (2005) Hilary Arksey and Lisa O'Malley. 2005. Scoping studies: towards a methodological framework. International Journal of Social Research Methodology 8, 1 (Feb. 2005), 19–32. https://doi.org/10.1080/1364557032000119616 · doi ↗
- 7Aubert et al . (2018) • Miles C. Aubert, Hayden Bader, and Kris Hauser. 2018. Designing Multimodal Intent Communication Strategies for Conflict Avoidance in Industrial Human-Robot Teams. In 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) . IEEE, 1018–1025. https://doi.org/10.1109/ROMAN.2018.8525557 · doi ↗
- 8Bacula et al . (2020) • Alexandra Bacula, Jason Mercer, and Heather Knight. 2020. Legible Light Communications for Factory Robots. In Companion of the 2020 ACM/IEEE International Conference on Human-Robot Interaction , Tony Belpaeme, James Young, Hatice Gunes, and Laurel Riek (Eds.). ACM, New York, NY, USA, 119–121. https://doi.org/10.1145/3371382.3378305 · doi ↗
