TV-watching partner robot: Analysis of User's Experience

Donghuo Zeng; Jianming Wu; Gen Hattori; Yasuhiro Takishima

arXiv:2302.14472·cs.MM·March 6, 2023

TV-watching partner robot: Analysis of User's Experience

Donghuo Zeng, Jianming Wu, Gen Hattori, Yasuhiro Takishima

PDF

Open Access

TL;DR

This paper introduces a TV-watching companion robot with open-domain chat capabilities, designed to enhance communication and enjoyment during TV viewing, and demonstrates positive user engagement in initial experiments.

Contribution

The paper presents a novel TV-watching robot with dual modes and topic-based dialogue management, improving interactive entertainment and communication during TV watching.

Findings

01

Participants enjoyed talking with the robot.

02

Users rated their interest in the robot highly (6.5/7).

03

People preferred talking to robots for more enjoyable and relaxed interactions.

Abstract

Watching TV not only provides news information but also gives an opportunity for different generations to communicate. With the proliferation of smartphones, PC, and the Internet, increase the opportunities for communication in front of the television is also likely to diminish. This has led to some problems further from face-to-face such as a lack of self-control and insufficient development of communication skills. This paper proposes a TV-watching companion robot with open-domain chat ability. The robot contains two modes: TV-watching mode and conversation mode. In TV-watching mode, the robot first extracts keywords from the TV program and then generates the disclosure utterances based on the extracted keywords as if enjoying the TV program. In the conversation mode, the robot generates question utterances with keywords in the same way and then employs a topics-based dialog…

Tables9

Table 1. Table 1 . The attributes of participants

Participants	Gender,age	Relation	Talks	Robot
G1(no robot)	Female,30s	Family	673	-
G1(with robot)	Female,30s	Family	555	37%
G2(no robot)	Female,20s	Friends	859	-
G2(with robot)	Female,20s	Friends	1446	12%
G3(no robot)	Male,20s	Friends	1337	-
G3(with robot)	Male,20s	Friends	960	35%

Table 2. Table 3 . A conversation sample

Speaker	Utterance content	dialog engine
S:	Have you seen the Internet?	TV Program
U:	Yes, I have!	-
S:	Oh, I see. Actually, I have, too. Wouldn’t you like to see it sometime?	TV Program
U:	Oh, I have seen it too?	-
S:	Well, it is the Internet after all. I wonder what it’s like?	Daily life
U:	It must be incredible, is not it, the Internet?	-
S:	According to CNET Japan, NVIDIA is close to an agreement to buy Arm from Softbank.”	News/SNS
U:	Oh, really? What’s that all about?	-
S:	Softbank G wants to sell ARM to NVIDIA for 4.2 trillion yen. The deal is expected to close in 18 months.	News/SNS
U:	That’s a lot of time!	-
S:	TikTok will be a big hole Oracle acquisition.	News/SNS
U:	What about TikTok?	-
S:	NVIDIA announced an agreement with Softbank to buy ARM for $40B.	News/SNS
U:	$40B is cheap!	-
S:	I know, right?	Response generation

Table 3. Table 4 . The conversation turns with robot

The number of turns in the conversation between the subject and robot
Average	4.71	5.26	6.88
Maximum	10	11	14

Table 4. Table 5 . The outline of experiment

Subject	64 people (2 people a group, 32 groups) Age: 10~70s Gender: male (9 people), female (55 people)
Experiment period	September~November, 2021 (Each group spent a day)
Experiment time	Morning (10:00-12:00) 1) Explanation and confirmation of consent for the experiment consent form 2) Watching TV without/with robot
	Afternoon (13: 00-18: 00) 1) Watching TV without/with robot 2) Post-questionnaire

Table 5. Table 6 . The conducted time and the conditions

	The control group	The intervention Group
Morning (10:00-11:00)	Explanation and confirmation of consent for the experiment consent form
Morning (11:00-12:00)	Watching TV without robot	Watching TV without robot
Afternoon (13:00-17:00)	Watching TV without robot	Watching TV with robot
Afternoon (17:00-18:00) After the experiment	I	I, II

Table 6. Table 7 . Questionnaire I.

No.	Question
I-1	Would you like to have a conversation robot at home?

Table 7. Table 9 . The answer of I-1 ”Would you like to have a talking robot at home?”

	The control group	The intervention Group
I think so	6	14
I don’t think so	26	18
Percentage of acceptance	18.75%	43.75%

Table 8. Table 10 . The answer results of questionnaire questions II-1 to II-9

Questionnaire Questions	Percentage of ”Yes”	Average of score	SD
II-1: The conversational robot increased the communication between the two?	-	4.78	1.38
II-2: The conversational robot made the place more relaxed.	-	5.06	1.29
II-3: The conversational robot turned my attention to television.	-	3.16	1.43
II-4: The conversational robot made it more interesting than usual.	-	4.5	1.39
II-5: Do you think your daily life would be better if you had a conversational robot in your home?	75%	-	-
II-6: How did you feel when the conversational robot was mumbling to itself?	72%	-	-
II-7: How did you feel when you interacted with the conversational robot?	76%	-	-
II-8: Do you want the conversational robot at home if we make it better?	72%	-	-
II-9-1: Improvement expected: More and more talk related to the TV programs	-	5.03	1.23
II-9-2: Improvement expected: The conversational robot can know/empathize with human emotions	-	5.22	1.61

Table 9. Table 11 . The statistics of the conversation turns

Turns	Total	9/1	9/8	9/14	9/18	9/23	9/30	10/5	10/13	10/16	10/19	10/20	10/23	10/27	10/28	11/9	11/12	11/18	11/19
Mean	5.54	4.92	4.15	7.00	6.32	5.96	5.45	5.38	6.45	5.12	5.67	6.83	5.91	5.40	6.21	5.88	6.52	4.09	4.61
Max	16	8	8	10	11	8	10	9	10	12	8	13	16	9	9	9	11	7	9

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Multi-Agent Systems and Negotiation · Speech and dialogue systems

Full text

TV-watching partner robot: Analysis of User’s Experience

Donghuo Zeng

[email protected]

1234-5678-9012

,

Jianming Wu

[email protected]

KDDI Research, Inc.2-1-15 Ohara, Fujimino, SaitamaSaitamaJapan356-8502

,

Gen Hattori

KDDI Research, Inc.2-1-15 Ohara, Fujimino, SaitamaSaitamaJapan

[email protected]

and

Yasuhiro Takishima

KDDI Research, Inc.2-1-15 Ohara, Fujimino, SaitamaSaitamaJapan

[email protected]

(2022)

Abstract.

Watching TV not only provides news information but also gives an opportunity for different generations to communicate. With the proliferation of smartphones, PC, and the Internet, increase the opportunities for communication in front of the television is also likely to diminish. This has led to some problems further from face-to-face such as a lack of self-control and insufficient development of communication skills. This paper proposes a TV-watching companion robot with open-domain chat ability with a range of 50 daily life topics. The robot contains two modes: “TV-watching mode” and “conversation mode”. In “TV-watching mode”, the robot first extracts keywords from the TV program and then generates the disclosure utterances based on the extracted keywords as if enjoying the TV program. In the “conversation mode”, the robot generates question utterances with keywords in the same way and then employs a topics-based dialog management method consisting of multiple dialog engines for rich conversations related to the TV program. We conduct the initial experiments and the result shows that all participants from the three groups enjoyed talking with the robot, and the question about their interests in the robot was rated 6.5/7-levels. This indicates that the proposed conversational features of TV-watching Companion Robot have the potential to make our daily lives more enjoyable. Under the analysis of the initial experiments, we achieve further experiments with more participants by dividing them into two groups: a control group without a robot and an intervention group with a robot. The results show that people prefer to talk to robots because the robot will bring more “enjoyable”, “relaxed”, “and interesting”.

TV-watching companion robot, KACTUS, Topic-based dialog management, increase communication

††copyright: acmcopyright††journalyear: 2022††doi: XXXXXXX.XXXXXXX††conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY††booktitle: AMX ’22: ACM Interactive Media Experiences June 22–24, 2022, Aveiro, Portugal††price: 15.00††isbn: 978-1-4503-XXXX-X/18/06††ccs: Computer systems organization Embedded systems††ccs: Computer systems organization Redundancy††ccs: Computer systems organization Robotics††ccs: Networks Network reliability

1. Introduction

Televisions (TVs) have become a ubiquitous part of our daily life and most homes have one in their living room. Watching TV in the living room not only allows you to get news information, enjoy experience entertainment programs, and gain some knowledge regarding daily life, but also improves communication among family members from different generations or friends. In this case, the individuals have the opportunity to enter into conversations and feel joy or sorrow through topics related to the television programs they watch. On the other hand, with the proliferation of smartphones, personal computers and the Internet, it is now possible to obtain information from sources other than television, so there may be fewer opportunities to communicate in front of the television. This reduction in communication results in fewer opportunities to interact with others, leading to the problems we observe today, such as health risks (House et al., 1988), inadequate development of self-control and poor communication skills (Sok et al., 2019).

With the development of AI technology, social robots have become widespread in recent years. In the future, it is expected that they will become further involved in our lives. However, currently, robots are not able to fluently communicate with humans. One reason for this problem is that the robot cannot behave/respond based on context that is not obvious in a conversation, whereas people can have a conversation based on the context behind the conversation. When we watch TV with our family or friends, we usually enjoy watching TV by sparking conversations and sharing our emotions through the contents of the programs and topics discussed by others. Therefore, we believe that by using information from TV programs can be used as the shared context for the robot to communicate with people. In addition, it may be possible to activate the communication while watching TV. In particular, the conversational robot/chatbot is expected to help the elderly improve the effectiveness of their interactions and reduce their feelings of loneliness and isolation. This could slow the rate of disability in the elderly and reduce social costs and the burden on caregivers (Su et al., 2017)

Since previous research (Yamamoto et al., 2009) has demonstrated the acceptability of the conversational robot with a single person living alone at home, we would like to focus on the conversational features of the TV-watching robot that accompanies families or friends to accept the future user experience of watching TV with a robot. Therefore, our work is guided by the following two research questions.

•

Research question 1: Will the user experience be acceptable when family or friends watch TV with the conversational robot?

•

Research question 2: Do the robot’s conversational features improve the well-being and relationships of families or friends?

In this case, we propose a TV-watching companion robot based on the conversational analysis of personal conversation behaviors when watching TV (Hoshi et al., 2020; Hagio et al., 2021). We utilize an open-domain chatbot ”KACTUS” (Wu et al., 2020a; wu2021tv) on the robot and expect it to be create even more excitement when watching on a TV screen with friends, and have the effect of encouraging communication. Related work is presented in Section 2. The system design of our robot prototype is presented in Section 3, and Section 4 presents the results of the TV watching experiment using the robot prototype. The study is summarized in Section 5.

2. Related works

2.1. TV robots

Although virtual agents that interact with people can be in the form of either computer-generated(CG) displays, robots in real space are reportedly more suited to dialog concerning an object that actually exists than a virtual agent on a display (Kidd and Breazeal, 2004). Therefore, there have been attempts to use robots in TV watching environments.

Ogawa et al. proposed a method for operating IoT devices, including robots, by distributing metadata in tandem with TV programs (Ogawa et al., 2018). The advantage of this approach is that the robots can be operated operated in concert with the TV program being watched. However, the cost of producing TV programs for broadcasters could increase owing to the creation metadata for operating robots. Nishimura et al. proposed a robot that uses comments posted on social media such as Twitter to chat with users(Shogo Nishimura and Hagita, 2017). This approach does not require the broadcaster to create metadata to be used for operating robots. However, when users are watching a TV program with low ratings, the number of robot utterances decreases because there are fewer comments on social media. Minami et al. proposed a TV watching chatting robot using social networking service (SNS) comments to incorporate humor so that users do not tire of robots and discontinue their use (Hidekazu Minami and Hagita, 2016). Muto et al. proposed a robot system architecture and algorithm providing TV programs and related information based on the user’s interests (Muto et al., 2006). However, none of these robot systems is based on an open domain chatbot.

On the other hand, HMD methods have been proposed that provide additional information to an existing TV program (Saeghe et al., 2019; Mathis et al., 2020). Robots in real space are reportedly better at conversing in real space than a virtual agent on a displays (Kidd and Breazeal, 2004). However, the additional devices must be attached to the human head or body, which is not just a burden but also is disruptive to conversations between people.

2.2. Personal conversation behaviors when watching TV

To understand the structure of natural dialog between persons watching TV, we recorded and analyzed dialog when groups of two well-acquainted people freely watched their favorite programs (Hoshi et al., 2020). We have conducted dialog analyses by classifying the types of conversations that people have among themselves when watching TV in order to determine their behavior. It was found that the percentage of ”disclosure” utterances (when an individual states their feelings or thoughts) in watching TV and ”question” in conversation was high.

2.3. Human-robot dialog with elderly people

It is a matter of urgency to develop a human-robot dialog system for the elderly people given the high percentage of population who aged 65 or over, such as in Japan (on the Ageing Society: 2020, Summary), to help those who live alone increase their opportunities to have a conversation in daily life. The Pearl (Pollack et al., 2002) reminds elderly people to perform daily activities, such as going to the bathroom every three hours and taking medication. In addition, the caregiver of an elderly people can enter enter someone’s daily activities in advance, and the Pearl will reminds the person based on the schedule. The first trial system (Minami et al., 2016) was designed to support a user to chat with a robot while watching a TV program, and it achieved natural timing in terms of simultaneous responses and supported interesting discourse derived from social media networks related to the TV program the user was watching. One of the barriers in human-robot dialog system development is that speech recognition frequently fails. To overcome the problem, a proposed question–answer–response dialog model (Iio et al., 2020) enable the robot to actively asks the users various questions and makes it possible for two robots to participate in the dialog. Another study (Jokinen et al., 2019) aimed at dialog modelling to enhance the communication capability between users and robots to improve the elderly people care services provided to elderly people. Furthermore, they introduced a dialog system architecture that has the ability to communicate with users based on care-giving tasks.

The conversational robots in the past conducted mostly directed or limited topic conversations, we propose a conversation robot for open domains that can freely talk about richer topics and behave like a human.

3. SYSTEM DESIGN

Our conversational features is designed as the following principles, which is shown in Fig. 1. the robot works by switching between the two following modes, ”TV-watching mode” and ”Conversation mode” to simulate human conversation behaviors when watching TV (Hoshi et al., 2020). The robot prototype could be switched repeatedly between ”TV-watching mode” and ”conversation mode”.

1) ”TV-watching mode”: In this mode, the robot talks about the commentary and the impression it gets from the content of TV as if it is enjoying the TV program. We think that the robot might initiate a conversation topic to create communication opportunities between families or friends when they are watching TV. We implement the following two features as follows.

•

Feature A: Extracts keywords from the TV program

•

Feature B: Generates the disclosure utterances based on the extracted keyword

2) ”Conversation mode”: In this mode, the robot generates question utterances with keywords in the same way and then offers a chat in an open domain related to the TV program. The conversation topic start related to the TV program, then it will expand to daily life and the latest News/SNS. We think that the robot might give more communication opportunities to improve the well-being and relationships of families or friends. We implement the following two necessary functions as follows.

•

Feature C: Generates question utterances with keywords from the TV program

•

Feature D: Employs a topics-based dialog management method consisting of multiple dialog engines for rich conversations related to the TV-program

Our robot prototype, equipped with the proposed open-domain conversational and an extension of the development robot CommU (University, 2015), is shown in Fig. 1. We introduced a prototype of camera-microphone array and installed it as an additional device at the bottom of the CommU (Fig. 1 bottom right). Using in the camera-microphone array, the robot could detect people nearby in the camera-microphone array within a range of approximately 200 degrees and distinguish the difference between a human voice and TV sound by means of the HARK’s sound localization and separation method (Nakadai et al., 2017). The distinguished human voice was converted into text by Microsoft speech recognition (Azure, 2021). Furthermore, although the study presents each function in English, the robot actually operated in Japanese.

3.1. TV watching mode (including the feature A and B)

Feature A: Firstly, the robot obtained the video, audio, and caption data from the TV program to extract keywords for the utterance generation. These sentences in the caption data were converted into word-separated writing using MeCab (Kudo, 2006) with mecab-ipadic-NEologd (Sato et al., 2017). If the TV programs do not contain caption data, we extract keywords from the content of TV program by implementing a video object detection method (Shaoqing et al., 2017; Lin et al., 2014b) on the AWS (Service, 2021), which learns by MS COCO (Lin et al., 2014a) and employs recognition API provided by (Azure, 2021).

The classification results of the detected objects are used as the extracted keywords. An example of keyword extraction using object detection on a local computer is shown in Fig. 1. The green rectangle indicates the object detected and the red area indicates the high salient estimation results. In the example of figure, the “elephant” was extracted as the keyword. We used Microsoft Azure (Azure, 2021) and Amazon Web Service (Azure, 2021) for the cloud services (Service, 2021). The keyword extraction of “elephant” is shown in the upper right of Fig. 1), as an example.

Feature B: Then, the robot’s disclosure utterances which express the robot’s feelings, are generated using the extracted keywords. We used the captions of the TV programs on channel 10 over the past seven years as the template sentences corpus for the disclosure utterance generation. These template sentences have fewer than 20 characters, because 79% of all utterances on TV program are less than 20 characters (Hoshi et al., 2020). The template sentences of past captions were used to learn the distributed representation of words by the Word2Vec (Mikolov et al., 2013). The robot selects the template sentence to use by calculating the high cosine similarity to the extracted keywords. Then, an utterance is generated by inserting/replacing the new keyword into the template sentence. As in the above example that the a disclosure utterance as “I like elephants” will be generated. Some template samples are as follows.

•

Disclosure: The generated disclosure utterances used template sentences containing emotional expressions (e.g., I want to eat ***, I want to go ***: keywords is expressed as *** )

•

Question: The generated disclosure utterances used template sentences containing questions (e.g., Do you want to eat ***, Do you like ***: keywords is expressed as *** )

Next, the robot determines whether to switch to the conversation mode. The proportion of disclosure and question speeches were determined in advance as parameters, and either of the two were randomly selected based on this proportion. The utterance interval could also be adjusted to avoid situations where the robot is constantly speaking. The utterance frequency was set in advance, and the utterance interval was determined using a Poisson distribution. Furthermore, keywords used once in utterance generation would not be reused for a while to generate a variety of utterances.

Finally, the robot turns to the TV and transits to the conversation mode to say the generated disclosure utterances by the CommU TTS API (University, 2015). At the same time as the above processing, the robot facing the TV continuously repeats random operations such as ”blinking”, ”nodding”, ”moving the neck”, ”moving the upper body”, just acting as though it is watching TV.

3.2. Conversation mode (including the feature C and D)

Feature C: In this mode, the question utterances are generated in manner similar to the above-described disclosure utterances. We used the the same template sentences corpus for the generation of question utterances. The robot selects which template sentence to use by calculating the cosine similarity of the extracted keywords. As in above example in the Fig. 1 that the question utterance as “Do you like elephant?”, will be generated.

The proportion of disclosure and question utterances was determined in advance as parameters, and either of the two were randomly selected based on this proportion. The utterance interval could also be adjusted to avoid situations where the robot is constantly speaking. Furthermore, keywords used once in utterance generation would not be reused for a while to generate a variety of utterances.

Feature D: At this time the robot turns to the viewers and says the generated the question utterances if a human response was obtained from the microphone array. Then the next response utterance is generated from the results of this speech recognition by the open-domain chatbot ”KACTUS”. To improve the well-being and relationships of families or friends, The chatbot provides not only an enjoyable conversation directly related to the TV program, but also facilitates chatting with the robot regarding a wide range of related topics through a multiple dialog engine. The four following conversation engines were combined for response generation.

•

TV program: TV program conversation includes responses following a robot’s question utterance.

•

Daily life: Broad conversation for generic content.

•

News/SNS: Conversation using relevant internet news and Twitter comments that changes over time.

•

Response generation: A transformer-based model (Vaswani et al., 2017) that uses large-scale dialog-pair datasets.

The chatbot strategy of topic-linked conversation management is shown in Figure 2 as follows. First, the robot and user utterance pair was converted to a distributed representation. Next, the robot selected the sentence with the highest Word Mover’s Distance (Kusner et al., 2015), similar to the distributed representation from the different conversation engines, and used the sentence as an utterance. The 1st turn used only conversation engine for the TV program, the 2nd turn used the conversation engine for the TV program and Daily life, and after the third turn used the conversation engine for the TV program and Daily life and News/SNS. That is, as the conversation progresses, the topic is expected to gradually move away from the TV program to other wide topics such as daily life and the latest news/SNS. In these cases, the robot continued the conversation by switching to the three conversation engines mentioned above. When the Word Mover’s Distance similarity between the user’s utterance and the set of conversation engines falls below the threshold, the robot generates a response utterance using the transformer-based response generation engine to avoid interruption. Finally, the conversation would end if any of the following conditions are met, and then enter TV-watching mode.

•

If it is determined from the user’s utterances that he or she wants to end the conversation.

•

If it is determined that the user has not answered more than twice.

When the conversation is continued, the robot waits once again for the voice input. When the conversation ends, the robot turned toward the TV and the current state transitions to the Status.1:TV watching.

4. EXPERIMENTAL RESULTS

To test user acceptance of the proposed conversation features designed for families or friends, we conduct the experiment in two steps. Although the same conversation functions of the robot are used, the first experiment focuses on ”Research Question 1”: can the robot’s conversation features help promote communication between families or friends when they are watching TV? The second extended experiment then focuses on ”Research Question 2”: do the robot’s conversation features improve the well-being and relationships of families or friends?

4.1. Initial Experimental Results

4.1.1. Experimental setup

The participants were assigned to one of three groups with two in individuals in each group, seen in the Table 2. We assume the initial use case is that the participants are enjoying the weekend TV programs, so the experiment time was approximately six hours from 10:00 to 16:00, including 1-hour lunch time. Participants were required to watch for the entire time, but can choose any TV program from the past seven years. At the end of the experiment, a 7-levels questionnaire survey ascending from ”strongly disagree” to ”strongly agree” in the Table 2 and an interview were conducted. The experiments were conducted over two days for each group, the first day conduct with no robot and the second day conduct with robot.

In order to conduct the experiment in a relaxed state, we used a room for the experiment that mimicked a living room. The robot prototype was placed on a table (see Fig. 3 (a)) and worked independently without control by the experimenter. The disclosure and question utterance for the robot was set in a ratio of 3:1, and the utterance frequency was set to once every 80 seconds on average.

4.1.2. Analysis Results

4.1.2.1 Number of human-utterances (talks)

In Table 2, the number of human-utterances of G1 and G3 decreased with the robot, while the number increased with G2. The proportion of human-utterances triggered by the robot was 37%, 12%, and 35%, respectively. Although we cannot conclude that the robot can increase communication opportunities, we found from the recorded videos that all groups listened to the robot’s utterances with great interest and laughed much more frequently than without the robot. For example, when a person replied ”I often shop at Ikebukuro”, the robot responded with the news about a car accident at Ikebukuro (A famous commercial and entertainment district in Tokyo), whereupon two people wondered how the robot could have known about the car accident. In the future, We plan to use FER (Yang et al., 2021) quantitative analyses to show how does the robot change communication sentiment.

4.1.2.2 Questionnaire

We design the questionnaire with the consideration of the users’ interest and the acceptability of the utterances, by using 7-levels ascending from ”strongly disagree” to ”strongly agree”. As for the users’ interest, the Q1 and Q2 about overall experience and conversation features were rated 6.5 and 5.0 on average in Table 2, with standard deviations (SD) of 0.46 and 1.51, respectively. In the acceptance test of utterances in interviews, During the interview, all participants said that the conversation with the robot was interesting enough. Then, the length, frequency, timing, and content of the robot’s utterances were rated 5.0, 3.5, 3.5, and 2.5, respectively. In the interview, although most participants found the length of the utterances (Q3-1) to be ”appropriate”, they had a negative impression of the other three features, which was due to the fact that the robot was talking without considering that the participants were focusing on the TV program. They also claimed that the utterances generated sometimes seemed unnatural (e.g.,Do you eat beds?). Finally, we analyzed the conversation logs and found that the robot-human conversations consisted of 7.4 utterances in average, which is close to the average human-to-human conversations when watching TV (Hoshi et al., 2020). Meanwhile, for the questionnaire item related to the content of the disclosure utterance(Q3-4), four out of six subjects responded negatively and a subject in group 2 said that “the robot sometimes said things which were unrelated to the TV program,” in the interview. This also confirms how the effective the robot is in continuing a conversation. Considering these issues, we are developing an engagement AI method (Wu et al., 2020b) so that the robot can register how intently users are watching TV. In addition, we plan to improve the accuracy of the utterance generation method in the future research.

4.1.2.3 The sample and the number of turns in the conversation between the subject and robot

Finally, we verified the quality of the robot response utterance by ”KACTUS” with reference to the number of turns in the conversation between the subject and robot. The number of utterances is given by the subject and robot for a continuous series of conversations regarding a single topic was defined as the number of turns and counted. Table 3 shows a continuous conversation sample and Table 4 shows the statistics of the turns. Human-to-human conversations consisted of approximately 3 to 6 turns, while on the other hand, 80% of all the human-to-robot conversations consisted of approximately 5 to 8 turns. Especially group 3, which was presumed to have enjoyed the conversations with the robot, tended to have a higher number of turns in a single conversation compared to the other groups, and the robot’s effectiveness in continuing a conversation by switching the topic dialog was confirmed.

4.1.2.3 The conclusion and consideration of improvement for the extended experiment

Although we cannot conclude from the result that the robot can increase communication between families and friends, the result shows that all participants enjoyed talking with the robot and the question about their interests in the robot were rated 6.5 (7-levels: ascending from ”strongly disagree” to ”strongly agree”). This indicates that watching TV with the robot has the potential to make our daily lives more enjoyable. On the other hand, considering the relatively low score from Q3-2 to Q3-4, we made three adjustments as follows for the next extended experiment.

•

We add a cancel button that allows the robot to stop talking immediately when the user is focused on a TV program such as a drama or movie.

•

We add more TV program words intto the dictionary for a better matching.

•

The WMD threshold is set to a higher value to prevent the robot from talking about irrelevant topics of the TV program.

4.2. Extended EXPERIMENTAL RESULTS

To investigate the effect of the proposal on ”Research Question 2”: whether users feel that the robot’s conversation features improve well-being and relationships among family or friends, we conduct the extended experiments as follows. We also revised the questionnaires so that we could assess the impact of TV watching robots on well-being and relationships.

4.2.1. Experimental Setup

A total of 64 subjects, consisting of 32 groups of two individuals (families or friends), who were friendly with each other and preferred to watch TV. They were all recruited by a temporary employment agency, regardless of age or gender. The experiment outline is shown in Table 9 and the attributes of subjects are shown in Table LABEL:tab:QA. As in the initial experiment, the experimental duration was approximately six hours including a one-hour lunch break, explanation and questionnaire time. The exact time allocation can be found in Table 9. Subjects were allowed to watch any TV program they freely selected from the content server, which contains ten channels of recorded TV programs from the past seven years. A questionnaire survey and interview were conducted at the end of the experiment.

A randomized controlled experiment was conducted to investigate the effects of the presence or absence of a robot during TV watching. The intervention group was the group that watched TV with the robot, and the control group was the group that watched TV with humans only.

In order to conduct the experiment in a relaxed state, we used a room for the experiment that mimicked a living room. The robot prototype was placed on a table (see Fig. 3 (a)) and worked independently without control by the experimenter. The ratio of ”TV-watching mode” and the ”Conversation mode” for the robot was set to 3:1, and the frequency of utterances was set to once every 80 seconds on average. We use AWS to collect dialog data.

Before the experiment, we explained the purpose and procedure of the experiment. After the experiment, we conducted a questionnaire and interview to ask participants about their impressions of the experiment.

At the beginning of the experiment, the following instructions were given. Instruction text: ”Today is a vacation. You are always busy, but today you decided to spend the day relaxing with your good friends (family), watching your favorite shows TV and chatting.

4.2.2. Questionnaires

The questionnaire that was asked as part of the viewing experiment consisted of the following two questions items, I-II. Questionnaires I were completed for the control groups, while questionnaire I-II was completed for the intervention groups after the experiment (Table. 6). The questions for questionnaire I-II are in Table. 7-8.

•

Questionnaire I: Question items to examine acceptance of general conversational robot impressions.

•

Questionnaire II: Question items to examine the impressions of the proposed conversational features.

4.2.3. Analysis Results

The following 2 factors were analysed based on the results of the questionnaire.

•

Factor (1): The acceptance of general conversational robots at home.

•

Factor (2): The impact/impression of TV watching robots on well-being and relationships after the experiment

In analysis of factor(1), we compared the response results of questionnaire I in the control groups and in the intervention groups, and analysis of factor(2) is the response result of questionnaire II in the intervention groups.

4.2.4. Analysis for Questionnaire I (Factor(1))

In this study, we investigated which question items change the impression of general communication robots depending on whether they are used or not. Table 9. shows the results of the answers to the questionnaire I-1 ”Would you like to have a talking robot at home?” (2-stage: 1: I think yes, 2: I do not think so), we found that the intervention group had a higher acceptance of the conversational robot with 43.75% than the control group with 18.75%. A chi-square test of I-1 was performed to detect significant differences between the two groups at a 5% significance level.

•

(I-1): Would you like to have a conversational robot at home?

4.2.5. Analysis for Questionnaire II (Factor(2))

To understand the user impressions and verify the effect of the designed conversational features A-D, we examined the responses of the questionnaire II given to the intervention groups in Table 10.

First, the answers to the questions II -1- II -5 in the questionnaire II are directly about whether the conversational features can improve well-being and relationships in the family or among friends. The response options were rated on a 7-point Likert scale from ”7: very much” to ”1: very little.” We can see that II -1, II -2, and II -4 scored relatively high, while II -3 scored relatively low. From the additional comments, there are relatively many positive opinions for ”the presence of the robot has strengthened the conversation between two people” and ”the presence of the robot has relaxed the place”. On the other hand, there are some negative opinions for ”the robot’s talking has drawn my attention to TV ”. The current performance of the conversation functions did not have the effect of directing people’s attention to TV while they were watching or interacting with other people. As to the II-5: What do you think your daily life would be better if you had a robot in your home? About 75% of the total respondents enjoyed interacting with the robot and. Most of them have positive comments included ”I was able to enjoy watching TV shows with friends and family”.

Second, the answers to the questions II -6- II -7 . are directly about the user’s impressions of the two modes of the conversational robot.

–II-6: How did you feel when the robot was talking to itself (”TV-watching mode”)? More than 72% of the total respondents felt it was interesting. There were also some negative comments such as ”sometimes it did not match the program” and ”sometimes it said the same thing again”.

–II-7: How did you feel after interacting with the robot (”Conversation mode”)? About 76% of the total respondents enjoyed interacting with the robot. Positive comments included ”they enjoyed the conversation between the robot and their family or friends”. Negative comments included ”it did not match the TV program sometimes” and ”sometimes it said the same topic repeatedly”. As described above, about 70% of respondents thought that the robot’s conversation were interesting, but there were some problems, such as content overlap.

Third, the answers to the questions provide information about user acceptance and the improvement factor in the future. – II -8 shows a higher acceptance rate than the result 43.75% in I-1 with 72% that users would like to have the conversational robot if we improved the conversational features. – II -9-1- II -9-2 shows the users’ expected improvement factor of top-2 for the conversation functions in the future.

Finally Table 11. shows the Analysis for Conversations turns.

As in the initial experiment, Table 11 shows the statistics of the turns in the extended experiment. We found that the average and maximum number of turns in the conversation did not change significantly, also indicating that the intervention groups enjoyed the conversations with the robot and that the robot’s effectiveness in continuing a conversation was confirmed by the topic-linked conversation management method.

4.2.6. Consideration

From the result, the intervention groups showed higher demand for ”It would be nice to have a TV-watching robot at home” compared with the control groups. The result also indicates that proposed conversational features have the potential to improve the well-being and relationships of families or friends. Furthermore, we also obtain improvement points in the future which will make a high user acceptance to the conversational robot, such as the conversation timing when the user focuses on the TV, conversation topics that are not consistent with the TV program or unsuitable topic transitions. We will improve the robot to resolve these issues and conduct future experiments with more subjects to verify the effect of our robot on humans while watching TV.

5. CONCLUSION

In this study, we presented the functional requirements of a robot that watches TV with people based on the dialog analyses between people when watching TV. In addition, we assessed the robot prototype developed as the first step in realizing this robot based on these functional requirements and the results of the operational verification based on TV watching experiments. Inspired by conversational analysis of personal conversation behaviors when watching TV, the robot works by switching between the two following modes, ”TV-watching mode” and ”Conversation mode”. In the ”TV-watching mode.”, the robot first extracts keywords from the TV program, and then generates the disclosure utterances based on the extracted keywords as if enjoying the TV program. In the ”conversation mode”, the robot generates question utterances with keywords in the same way, and then employs a topics-based dialog management method consisting of multiple dialog engines for rich conversations related to the TV-program.

The initial experimental were conducted in which all participants are divided into three groups with two participants in each group. Although we cannot conclude that the robot can increase communication opportunities between families or friends when they are watching TV, the result shows that all participants from the three groups enjoyed talking with the robot. This also indicates that the proposed conversational features of TV-watching Companion Robot have the potential to make our daily lives more enjoyable.

Furthermore, we further conduct additional extended experiments with 32 groups with two participants in each group, half of whom watched TV without the robot as the control group and the other half watches TV with the robot as the intervention groups. The intervention groups showed higher demand for ”It would be nice to have a TV-watching robot at home” compared with the control groups, and a significant difference was confirmed. We also found that many of the intervention groups felt that the conversational functions of the robot ”made the place more relaxed” and ”increased conversation between the two of us,” and ”made it more interesting than usual.”. The result indicates that proposed conversational features have the potential to improve the well-being and relationships of families or friends. This study also obtain improvement points based on the questionnaire and operational review.

Meanwhile, we collected points for improvements based on the operational review, such as the conversation timing when the user focuses on the TV, conversation topics that are not consistent with the TV program or unsuitable topic transitions. We will improve the robot to resolve these issues and conduct future experiments with more subjects to verify the effect of our robot on humans while watching TV.

Acknowledgements.

The authors would like to thank to the experts from Systemsoft Inc: Zhiguang Zhou, Zheshuang Lyu, Wataru Nishioka, Megumi Komiya for their great support and advice on our proposed method and its optimization.

Bibliography34

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Azure (2021) Microsoft Azure. 2021. Azure Cognitive Services . Microsoft. Retrieved Oct 3, 2021 from https://azure.microsoft.com/en-us/services/cognitive-services/
3Hagio et al . (2021) Yuta Hagio, Marina Kamimura, Yuta Hoshi, Yutaka Kaneko, and Masao Yamamoto. 2021. TV-watching Robot: Toward Enriching Media Experience and Activating Human Communication. In International Broadcasting Convention, 2021 .
4Hidekazu Minami and Hagita (2016) Masayuki Kanbara Hidekazu Minami, Hiromichi Kawanami and Norihiro Hagita. 2016. Chat robot coupling machine responses and social media comments for continuous conversation. In In 2016 IEEE International Conference on Multimedia Expo Workshops (ICMEW). IEEE Press, 1–6.
5Hoshi et al . (2020) Yuta Hoshi, Yutaka Kaneko, Michihiro Uehara, Yuta Hagio, Yasuhiro Murasaki, Satoshi Nishimura, and Masao Yamamoto. 2020. Utterance Function for Companion Robot for Humans Watching Television. In 2020 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, January 4-6, 2020 . IEEE, Las Vegas, NV, USA, 1–5.
6House et al . (1988) James S House, Karl R Landis, and Debra Umberson. 1988. Social relationships and health. Science 241, 4865 (1988), 540–545.
7Iio et al . (2020) Takamasa Iio, Yuichiro Yoshikawa, Mariko Chiba, Taichi Asami, Yoshinori Isoda, and Hiroshi Ishiguro. 2020. Twin-robot dialogue system with robustness against speech recognition failure in human-robot dialogue with elderly people. Applied Sciences 10, 4 (2020), 1522.
8Jokinen et al . (2019) Kristiina Jokinen, Satoshi Nishimura, Kentaro Watanabe, and Takuichi Nishimura. 2019. Human-robot dialogues for explaining activities. In 9th International Workshop on Spoken Dialogue System Technology . Springer, 239–251.