A Study on Intelligent Dialogue Agent for Older Adults’ Preventive Care-Towards Development of a Comprehensive Preventive Care System

Article history: Received: 30 August, 2020 Accepted: 18 October, 2020 Online: 08 November, 2020


Introduction
This paper is an extension of work originally presented at the 2019 International Conference on Technologies and Applications of Artificial Intelligence [1].
Japan is currently known as the most advanced super-aged society. In 2018, Japanese people aged 65 and older accounted for 28.1% of the total population [2]. As the aging demographic increases, the number of people requiring long-term care also increases, as does the number of people who need nursing care or other human support, and that number in Japan has reached 6.41 million, as certified by the Long-Term Care Insurance Act [3]. Preventive care approaches, which aim to decrease the number of people who require nursing care or other human support by reducing the decline in individuals' mental and/or physical functions, have attracted a lot of attention. These approaches also aim to control increasing medical costs and the associated burden because the number of older adults is increasing while the working population is expected to decline due to declining total fertility rates. In recent years, many researchers in various fields have focused on the prevention not only of dementia but also of fall, one of the causes of the need for nursing care, and have shown a certain degree of effectiveness in preventive care [4]- [12]. However, some older adults experience physical and/or psychological distress due to preventive care activities, and therefore have trouble engaging in these activities on a regular basis.
At the same time, in the past decade, various voice-interactionbased devices and softwares (Dialogue Agents) such as smart speakers (e.g., Google Home and Amazon Echo) and voice assistants (e.g., Siri and Cortana) [13] have been developed and employed to provide useful information and a variety of services in response to users' prompts. In addition, there has been much research into the use of Dialogue Agents for older adults with dementia and mild cognitive impairment [14]- [17]. However, it should be noted that these devices and the Dialogue Agents generally do not talk to users without a specific request or prompt. However, we believe that introducing a function that makes it possible for Dialogue Agents to speak to users spontaneously in order to elicit a reaction will play an important role in endowing the agents with a sense of humanity and familiarity.
Against these backgrounds, previous studies examined Intelligent Dialogue Agents (IDAs) that allow natural and flexible communication with and monitoring of older adults by using a Smart Speaker, a friendly interface, to investigate older adults' impressions of the Smart Speaker and the basic characteristics of the IDA [18]. The IDA employed a Spontaneous Talking Function (STF) that allows the IDA to talk to older adults without users' requests to encourage them to actively dialogue. Our research group has shown that the STF is effective in some ways, particularly in providing increased convenience and familiarity to older adults. Our research group is also developing a cognitive training system based on a memory game on a tablet device [19] and a Fall Prevention System for regular preventive care exercise [3] to prevent dementia and fall, one of the main factors leading to a need for nursing care. Additionally, our research group is conducting a study to estimate the behavior of older adults who live alone from data gathered by supervision sensors installed in their homes [20]. In this paper, we introduce a Speech Content Coordinating Function (SCCF) into the IDA in order to further improve the user's familiarity with and interest in the IDA. The SCCF determines the content of speech by using a reinforcement learning algorithm according to the user's preferences and condition. In order to investigate the basic characteristics of the SCCF and users' impression of it, and to evaluate whether the SCCF was able to acquire appropriate policy depending on its users, several experiments were conducted. In the experiments, we developed a text-based agent in the form of a chat-bot and introduced the SCCF into it.
In addition, this paper considers the feasibility of a Comprehensive Preventive Care System (CPCS). Although we aim to develop the CPCS by integrating our preventive care systems, it is currently in the conceptual phase. Once fully developed, the CPCS will contribute to the encouragement of active conversation and the provision of not only effective monitoring of the health condition of older adults but also regular cognitive training and fall prevention exercise. As the first step to examine the adequacy and impressions of the CPCS, we conducted the experiment with a prototype version of the CPCS using both the IDA with a sensor and the Mechanism for Cognitive Training (MCT) which is at present equivalent to the cognitive training system. At present, neither the IDA nor the prototype CPCS has been tested by older adults, because their basic characteristics and impressions have not been evaluated. Therefore, we enrolled students as participants in the experiments as the first phase to confirm the future applicability of the IDA and CPCS for older adults.

Preliminaries (Reinforcement Learning)
Before describing our study, we will first introduce the concept of Reinforcement Learning (RL). RL is a machine learning method [21]. An RL agent attempts to acquire an appropriate policy (a state-to-action map) based on observations and trial-and-error interactions with its environment. A policy characterizes an agent's behavior and is described using the function: where S and A denote sets of states and actions, respectively. The pair (s, a)(∀s ∈ S, ∀a ∈ A) is referred to as a rule, which states that "if an agent observes a state s, it outputs an action a". At each time step t, an agent observes state s t , and selects action a t using π. RL algorithms are roughly classified into two approaches, exploration-oriented and exploitation-oriented approaches. The former approaches, which include Q-leaning, Sarsa, and so on, aim to optimize the total discounted reward typically in environments modeled by Markov Decision Process (MDP). The latter ones, such as Bucket Brigade and REINFORCE, are based on the notion of credit assignment and aim to increase learning speed and to acquire reasonable policies even in non-MDP environments although it is not ensured to acquire the optimal policy by using them. Because the environments we treat in this study seem to be non-MDP ones, the exploitation oriented RL algorithms are applied. Brief explanations about REINFORCE and Bucket Brigade, which are employed in the IDA and MCT (described later), respectively, are presented below.

REINFORCE
REINFORCE [22] is an exploitation-oriented approach based on a policy-gradient reinforcement learning algorithm. Policygradient reinforcement learning algorithms such as natural policy gradients [23], and stochastic gradient methods [24] learn policies directly by updating parameters θ that characterize a policy π θ , as opposed to many conventional reinforcement learning algorithms such as Q-learning [25] or Sarsa [26]. The values of θ are related to conditional probabilities p(a|s). In other words, the policy-gradient reinforcement learning algorithms aim to acquire a probability distribution that selects the optimal action to take in each state so as to maximize the sum of the rewards. The goal of these algorithms is thus to obtain the parameters θ, which characterize the probabilistic policy π θ that maximizes the sum of rewards, updated according to the following formula (2): where η is the learning rate. REINFORCE updates θ by ∆J(θ) shown below: where s m,t , a m,t , and r m,t are state, action and reward at the t-th time step in the m-th episode (t = 1, ..., T ; m = 1, ..., M), respectively. In the above formula, an episode denotes the sequence of rules selected between rewards, and one time step is counted when the agent selects a rule. The term b(θ) reduces the variance of the estimate and serves to stabilize the policy learning process, called the baseline (b(θ) = Σ M m=1 Σ T t=1 r m,t ), and π θ denotes the agent's policy. In the present study, the probability that the IDA will select its action a given a state s is computed by the softmax function (4) with the www.astesj.com 10 inverse temperature parameter β, because both the state and action spaces are discrete.

Bucket Brigade
In the Bucket Brigade algorithm [27], a policy is characterized by rule-weights w(s t , a t ), which are used as an agent's policy π. When an agent acquires reward R after it selects rule (s t , a t ) at time step t, the weights of rule w(s t , a t ) and w(s t−1 , a t−1 ) are reinforced as follows: where C bid and C tax denote a "bid" parameter specifying a degree of propagating the weight w(s t , a t ) to the weight w(s t−1 , a t−1 ), in which the rule (s t−1 , a t−1 ) plays a role of "trigger" to select the next rule (s t , a t ), and a "tax" parameter which determines a discount rate for the weight of the rule selected at step t, respectively. As described earlier, the Bucket Brigade is one of the algorithms categorized as an exploitation-oriented approach. The above formulae mean that this algorithm updates a rule weight at each step, even when no reward is given, and also that this algorithm is relatively easy to implement compared to other conventional RL methods. Moreover, the agent with the Bucket Brigade can acquire reasonable policies in many cases even if the agents do not have a sufficient number of trials because it actively employ rewarded past experience. This study applies the algorithm to the function for adjusting the difficulty level of cognitive training for respective users in the mechanism providing the users with cognitive training, which is described in the next section.

Preventive Care Systems
In this section, we present the preventive care systems that are currently under development or about to start development. We first overview the Intelligent Dialogue Agent (IDA), which our research group is currently developing, then introduce the Mechanism for Cognitive Training (MCT) as another preventive care framework, and finally provide brief outline of the Comprehensive Preventive Care System (CPCS), which will consist of the above two components and will be developed as our final goal.

Intelligent Dialogue Agent (IDA)
We have developed the Intelligent Dialogue Agent (IDA) with the aim of encouraging natural and flexible conversation with its users (older adults) and of monitoring older adult's health condition based on their behavior and dialogue history. The IDA talks spontaneously, using a variety of contents such as notification of daily routines (e.g., taking medicine, walking), trivia, news, weather and traffic information, and food information. The sensor mounted on the IDA also can monitor older adults within its detection area.
3.1.1 Architecture of the IDA Figure 1 shows the final design of the IDA. The IDA consists primarily of an input/output (I/O) unit, a learning unit, and a database. The I/O unit is composed of a microphone, a speaker and a sensor. The user's motion is recorded by the sensor in order to track his or her characteristic behavior, such as lifestyle activities (e.g., waking and sleeping times, regular exercise). The pyroelectric sensor we adopted in this study, and which is embedded in the IDA, is both inexpensive and easy to use. Moreover, our research group has already employed a similar type of sensor and confirmed that it worked effectively to capture the motions of individuals [20]. This sensor can detect objects separate from the user, as well as even the slightest movement of people separate from the user. Although the IDA currently employs a microphone and speaker which are implemented by the included smart speaker (Google Home Mini) as its I/O unit, these components have certain limitations; for example, the smart speaker cannot acquire the user's response and dialogue history in real time. We thus plan to develop and implement an IDA that includes the highly desirable speech recognition and synthesis functions without using a smart speaker.
(2) Database The database stores logs of the user's activity as detected by the sensors, and the history of dialogue between the IDA and the user, which is converted from speech information into text data by the Speech Recognition Function, a component of the Learning Unit. The data stored in the database are used to estimate the user's condition and to adjust the behavior of the IDA. We adopted a NoSQL cloud-hosted database known as the Firebase Realtime Database (https://firebase.google.com) for this component because it is compatible with Google Home Mini, which we employed in the IDA.
(3) Learning Unit www.astesj.com The learning unit has four functions: the Spontaneous Talking Function (STF), the Speech Recognition Function, the Response Function, and the Speech Content Coordinating Function (SCCF). A brief explanation of the four functions is as follows. The IDA can output its speech content without prompting by the user at the appropriate time by using the STF. The STF triggers speech to the user based on the user's daily routines and behavioral logs stored in the database. The Speech Recognition Function converts voice information obtained from the microphone, such as questions from the user or the user's responses to the IDA's speech, into textual information. The Response Function transforms the next speech content into voice information and transfers it to the speaker. In addition, the SCCF provides appropriate speech content depending on the user's preferences and condition. We expect that introducing the SCCF into the IDA will improve the convenience and familiarity of the IDA and hence the user's willingness to continue using the IDA. In order to adjust the content, timing, and frequency of speech, the following history information should be taken into account: 1. The behavioral history of the user (i.e., the frequency and number of times the user is detected) obtained by sensors on the I/O unit.
2. The history and frequency of dialogue between the user and the IDA.
3. What kinds of speech content were previously produced by the IDA and at which time slots.
4. Basic information about the user (e.g., age, gender, address, preferred topics for conversation)

The Speech Content Coordinating Function (SCCF)
The results of the experiments and questionnaires conducted in our previous study confirmed that the provision of appropriate information by the IDA increases the willingness of users to continue participating in dialogue with the IDA [1]. In the present study, we introduced the SCCF into the IDA and investigated its basic characteristics and performance.
In order to acquire the adequate behavior (policy) to determine natural responses depending on the particular user's preferences and/or circumstances, we apply the REINFORCE algorithm described in Section 2.1 to the SCCF as the policy learning algorithm. To conduct policy learning, we defined state, action, and reward for the IDA. The IDA's action is defined as the speech category c ∈ C that the agent employs to talk to the user (C: A set of speech categories). There are 13 different speech categories: C = {time signal, greeting, today's weather, weekly weather forecast, today's trivia, domestic news, international news, economic news, technology news, science news, entertainment news, sports news, and each user's daily routine work}. The state of the IDA is defined as the pair of speech categories (c 1 , c 2 ) that constituted the agent's output in the last two time steps. This state definition is based on the assumption that there is some relationship between past speech content and current content in human-to-human conversation (c 1 ∈ C, c 2 ∈ C). The user's feedback on the speech content spoken by the IDA is employed as the reward. All feedback is classified into one of three categories: positive (e.g., good, useful, beneficial), negative (e.g., bad, not useful, detrimental), or uninteresting, and the reward values obtained by the IDA are determined according to the feedback categories. Before establishing the reward setting, we asked a speech therapist at a long-term care insurance facility for older adults to comment on "how older adults come to have a good impression of speech content". She responded as follows: [Comments of speech therapist] "Even if older adults express negative responses to the information provided by the agent's speech, they may not necessarily have a negative evaluation of the corresponding speech (i.e., any user's reaction to the speech is evidence of being interested in the speech content)." Based on this information from the speech therapist, we set up our SCCF to have two reward settings, one employing the speech therapist's comments and the other based on another assumption under which "the users' interest in speech directly corresponds to interest in the content of the speech." When the IDA acquires a reward, the weight parameter θ for each state and action are updated, and the probabilities for selecting actions are changed.
We substituted the softmax function (4) into the REINFORCE updating equation ( (2), (3)) and the number of time steps T was set at one because we defined one episode as ending with one instance of speech. The following formula derived from (2), (3), and (4) is applied to the policy learning (parameter update) in the SCCF.

The Mechanism for Cognitive Training (MCT)
Our research group has developed the Mechanism for Cognitive Training (originally proposed as the cognitive training system) that encourages older adults to engage in cognitive training to help prevent dementia [19]. As shown in Figure 2, the MCT provides older adults with a brain-twister-based memory game as cognitive training, provided in such a way that they can regularly perform this memory game through repetitive interactions with a software agent on a tablet device. The software agent also serves as both an opponent who plays the game with the older adult users and a conversation partner; the goal of the latter function is to increase the familiarity of older adults with the MCT. Our system is expected to prevent dementia by preserving and improving cognitive functions such as short-term memory and encouraging and improving the user's motivations to use them. Furthermore, the MCT has a difficulty adjustment function that can set the appropriate game difficulty level according to differences between users and each individual user's current circumstances in order to create an environment that makes it easier for users to continue cognitive training for long periods of time, since they neither get bored because the game is too easy nor give up because it is too difficult. The database stores information on each user's game-playing conditions, such as memory time, response time, accuracy rate, and difficulty, and uses these data to adjust the difficulty level and provide feedback on the memory game results to the users.
The difficulty adjustment function employs the Bucket Brigade algorithm described in Section 2.2 in order to coordinate the game difficulty level. The Bucket Brigade algorithm is known to have two www.astesj.com characteristics: (i) advancing policy learning even when no reward is given; and (ii) additionally acquiring reasonable policies in many cases, even if the agents do not have a sufficient number of trials, by actively employing rewarded past experiences.
The software agent in the MCT is based on the concept of Human-Agent Interaction (HAI), which aims to design an appropriate framework of interactions between humans and agents. The MCT currently has a function to coordinate speech content depending on the game-playing frequency and/or time interval between bouts of gameplay so that a user can play the game (engage in the cognitive training exercise) comfortably. However, since it is intended that the software agent will be implemented on a tablet device, its functional and performance limitations may prevent it from fully achieving the desired functionality. Therefore, both functions for coordinating speech content and sensors for detecting the user's motions will be implemented in the IDA. In addition, the functions currently embedded in the software agent on the MCT will be implemented in the CPCS as functions of the IDA.
As a result of several experiments, we have confirmed that older adults were able to increase the difficulty level and the accuracy of the cognitive training by using the MCT. Furthermore, introducing the difficulty adjustment function made it possible to appropriately coordinate the training difficulty depending on the circumstances of the older adult playing the game. Although the use of MCT clearly had a positive effect on maintaining/improving the short term memory and cognitive function of older adults, we also confirmed that experiments over a longer period with a much large number of participants will be needed in order to make a detailed assessment of the MCT's effectiveness in dementia prevention.

Comprehensive Preventive Care System (CPCS)
The Comprehensive Preventive Care System (CPCS) aims to allow older adults to enjoy engaging in cognitive training while being monitored by their family and friends through active conversation. The CPCS consists of an IDA, which can carry out natural conversation with users, and an MCT (detailed in section 3.3) which provides users with a memory game on a tablet device as a kind of cognitive training ( Figure 2).
Our research group is carrying out research on and development of each component with a view towards their eventual integration into the CPCS. Because the integration of the IDA and the MCT will allow the CPCS to employ the data stored in both in parallel, this system will be able to produce a synergistic effect between the cognitive training and monitoring functions. For example, the cognitive training status obtained by the MCT will be combined with the dialogue/behavior history acquired by the IDA, making possible a more detailed understanding of the user's condition (e.g., the user's degree of fatigue, whether he or she has taken a break while engaging in cognitive training, and so on). One of the final purposes of the CPCS is to notify family and friends of the circumstances of an older adult user obtained by the system, which also aims to accomplish the following three goals: 1. To maintain and improve the cognitive function of older adults; 2. To promote active conversation between older adults and their family, friends and the IDA; and 3. To create an effective environment where older adults and their families and/or community groups can be mutually involved to better monitor the older adult's health condition, and promote their care and dementia prevention.

Experiments
As described in Section 1, we have not yet had an opportunity to employ older adults in the evaluation of the SCCF and the CPCS. In addition, many aged households lack a Wi-Fi environment which would be needed in order to use and evaluate the CPCS at home. For these reasons, we conducted several experiments with students to confirm that the SCCF can appropriately coordinate the speech content depending on the user's circumstances and to evaluate the basic characteristics and impressions of the CPCS. Moreover, we used the prototype version of the CPCS (described in a later section) in these experiments, because at present we have not evaluated the full version of the CPCS, but only its respective components (the IDA and the MCT).
Before conducting the experiment, a simple written questionnaire was administered to older adults in order to confirm their awareness of smartphones, the internet environment at their home, and the degree of interest/expectation in regard to spontaneous talking by the IDA (smart speaker). The questionnaire results showed that (i) older adults' degree of recognition of smart speakers was low; (ii) the majority of them did not have an internet (Wi-Fi) connection that would permit the use of a smart speaker at home; however, (iii) they had a high level of interest about conversation with the IDA (smart speaker), particularly its spontaneous talking function. Regarding item (iii), the older adults had on average 2.3 topics that they wanted the IDA to talk about, and no older adults answered that the talking function was "unnecessary" or that they had "no topics which they wanted to make the smart speaker talk about". The detailed results of the questionnaire are presented in Appendix (A).

Evaluation of the SCCF
In our first experiment, we aimed to evaluate the basic characteristics of the SCCF by employing a prototype of the IDA (henceforth referred to simply as the IDA) developed as a chat-bot using the Slack message application. The chat-bot periodically talks to provide information and learns from its users' feedback. The experiment was conducted for six days and our subjects were 10 students (average age: 18.9).
As the speech content of the chat-bot, six out of 13 categories were randomly selected before starting the experiment. The total number of states was 6 × 6 = 36 because the chat-bot had up to two previous categories (c 1 , c 2 ) as pairs. From the set of selected speech categories C (|C | = 6), the chat-bot selects speech category c according to the current policy. The initial value of the parameters θ which characterize a policy π θ equals θ 0 . Therefore, the chat-bot chooses a speech category with uniform probability at the first stage of the experimental period. Each participant responded to the speech content provided by the IDA with one of three kinds of feedback (P/N: positive/negative impression; U: uninterested). Reward values were determined by the collective feedback of users. The following two types of reward settings were utilized based on comments received from the speech therapist.

[Setting R A ]
Under this setting, in accordance with the speech therapist's assumption that "older adults would be interested in a topic if their evaluation was not U," a positive reward (+1) was assigned if the participant's feedback was P or N, and no reward was given otherwise.
[Setting R B ] Under this setting, positive (+1) reward, negative (-1) reward, or no reward was given if the user's evaluation of a topic was P, N, or U, respectively.
The participants gave feedback for approximately 300 speech contents for each setting. To avoid any influence of the order of the participants' evaluations in settings R A and R B , the participants were grouped into two equally sized groups, and the order was changed in different groups. The parameters used in the policy learning and action selection are shown in Table 1. In addition, subjective evaluations was conducted after the end of the experimental period (three days) for each reward setting and after the end of the experiment to examine the participants' impressions of the SCCF. We also administered the questionnaire regarding the level of interest in each category at the end of the experiment because we did not want it to affect the participants' evaluation of the chat-bot's speech. Which speech categories provided by the IDA were most preferred varied among users. The favorite speech categories and/or favored order of talking could also vary within the same user depending on the current circumstances (e.g., on holiday, at work, while eating, before bedtime, and so on). For these reasons, it was difficult to acquire a policy for providing optimal speech contents in the experiments; however, we can say that we were able to acquire a policy better expressing the user's preference by the learning in the SCCF if the speech categories preferred by the user were easy to be provided from the IDA compared to the other categories. The experiment thus confirmed the probability of the IDA generating speech for each category after learning the preferences of users while investigating the favorite categories of users via the questionnaire. By comparing the above results, we discuss the adequateness of the SCCF by evaluating whether the SCCF can learn the preference of users and how accurately the learning result matches the actual preference of users.

Preliminary Experiments with the CPCS
Several experiments were conducted to investigate the synergistic effect of using the MCT in conjunction with the IDA. As part of our preparation for conducting the experiment with older adults, we asked 10 students (the same as the participants in section 3.1) to participate in our preliminary experiments. However, after excluding one participant (participant F) due to an equipment malfunction that resulted in incomplete sensor data, a total of nine participants were included in these experiments.
A prototype version of the CPCS was employed for these experiments. The system consists primarily of the MCT and the IDA, but does not have functions that allow the two mechanisms to communicate or that allow either of the components to use data collected by the other. Using both mechanisms in parallel would contribute to the combined use of the log data obtained by each. We conducted two kinds of experiments to investigate differences in impressions of the CPCS when the IDA talked spontaneously and when it did not. This experiment was conducted in two periods, a speech period and a nonspeech period, of three days each for a total experimental duration of six days. However, to ensure that the order of the two periods did not affect the subjects' impressions of the CPCS, the order was randomized for each participant. The IDA employed in the experiment had a simple STF consisting of a smart speaker (Google Home Mini) and a pyroelectric sensor without introducing the SCCF, and provided information (speech content c ∈ C = C\ {each user's daily routine work}) selected randomly every 15 min between 9:00 and 23:00 during the speech period. Additionally, we asked participants to listen to the IDA's speech at least eight times a day. Sensors detected the participants' behavior throughout the experimental period (sampling interval: 1 s; detection range: 115 • within a 2-m view range). The difficulty adjustment function was included in the MCT. We asked the participants to play a 10-min memory game every day near the location where the sensor was installed.
Written questionnaires were also administered after the first half of the experiment and at the end of the experiment to inquire about the number of times participants heard the IDA speaking, whether the difficulty level of the memory game was appropriate, and their www.astesj.com Similarly to the case of IDA, how efficiently the CPCS prototype can monitor a user's behavior would be influenced by its installation site and the daily habits of the user. However, synergistic effects are also expected through the concomitant use of the IDA and the MCT, and could have a positive impact on the frequency with which users engage in cognitive training. To clarify these issues, we evaluated and discussed the impressions of users, the effectiveness of the system at monitoring users, and the relationships between dialogue with the IDA and the frequency of cognitive training by examining data such as the sensor log, the result of the questionnaire/interview with the participants, and the training frequency in parallel.

Results and Discussion
This section describes our experimental results and discusses our investigation of the basic characteristics and performance of the SCCF as well as the effectiveness of the CPCS prototype presented in Section 4.

Performance Evaluation of the SCCF
Let us first discuss the performance and adequacy of the SCCF in providing appropriate information according to the user's preferences and condition. Table 2 lists the probability of the chat-bot talking about each category over the last 100 times in the three-day experimental period. This table shows the results for all participants. In this table, "ranking" corresponds to the participants' interest levels for each speech category obtained by the questionnaire administered after the experiment. This table revealed that the probability of talking acquired by the learning tends to be high when the rank of the interest level is high, especially under the reward setting R B . The bold digits in the table indicate areas in which probability increased from its initial value (16.66...%). Table 3 represents the results of the questionnaire concerning the participants' interest levels for each category on a five-point scale (the other questionnaire results are shown in Appendix (B)). Digits in boldface indicate areas with an interest level greater than 3. Table 2 shows that, for reward setting R B , the probabilities of the top three speech categories tend to be larger than the initial probability value while the probabilities of the bottom three categories tend to be smaller. Additionally, the results presented in Tables 2 and 3 show that, under reward setting R B , there is a positive correlation between the probabilities of speech from the chat-bot at the end of the experiment and interest in the actual content of the speech under reward setting R B . In other words, the probability that the chat-bot provides the participants with topics of interest increased throughout the experiment, while the probability that it provided topics for which many participants' feedback was U (no interest) decreased. Therefore, we believe that the chat-bot with reward setting R B was able to provide information that reflected the users' thoughts and states.
On the other hand, the chat-bot with reward setting R A failed to acquire the appropriate policy. An important feature in the chat-bot with setting R A was that the probability of a particular category being provided was too large (up to 98%) when targeting a specific participant compared to the results obtained with setting R B (up to 41%). Critically, reward setting R A does not include negative rewards; regardless of whether the user's feedback is positive or negative, a positive reward is given. Thus, it is much more likely that the chat-bot with the reward setting R A will be given a positive reward than the chat-bot with reward setting R B . However, it seems that the speech probability was changed too drastically for the chat-bot with reward setting R A under the current parameter settings.

Basic Characteristics of the CPCS
This section discusses whether the prototype version of the CPCS can monitor and/or understand a user's circumstances in detail using the log data collected by the IDA and MCT, and evaluates users' impressions of the system as well. The sensor data of participant F were incomplete due to an equipment malfunction and are therefore not included in these results.

Differences in the Participants' Amounts of Activity
Let us consider the amount of activity detected by the pyroelectric sensor on the CPCS. A participant's amount of activity Act x is defined as the number of times the sensor detects his/her motion for x min. The maximum and minimum values of Act x equal 60x and 0, respectively. Since the sensor may also detect objects other than the user, the value of Act x must be corrected depending on the experimental circumstances. Table 4 shows the maximum amount of activity per 30 min for each participant over the course of the experiment. As shown in the table, Participant B had the greatest amount of activity (number in red) while Participant A had the lowest amount (blue). Figure 3 also shows sketches of the rooms usually used by Participants A and B, and the location and orientation of the sensor in those rooms.
Installation of the sensor in a location where it can detect the participant's motion appropriately is essential in order for behavior to be estimated accurately. Since Participant B answered in the interview after the experimental period that she often reads for long periods of time at her desk, we believe that this caused her amount of activity to be greater than that of the other participants because she was active within the detection range of the sensor. These results suggest that environmental factors, such as the location and direction of the sensor, may influence the recorded amount of activity (some noise and error in sensing may also influence the results).
Therefore, in the case that the CPCS in its current form would be employed in practice by older adults, it is necessary to consider in advance where the sensors should be installed to effectively track the older adults' amount of activity. At the same time, if the CPCS could be installed at the appropriate location, we can expect to properly understand their behavior.   Figure 4 depicts the average number of cognitive training memory games played every 30 min on weekdays (bar graph) and activity patterns (i.e., transition in the amount of activity) recorded by the sensor (line graph). The results shown in this figure are the average values for seven participants who showed similar behavioral trends. Participant I was excluded because his academic year and life patterns were different from those of the others. Here, interval (b) in Figure 4 is characterized by low activity because the participants were at school and not near the CPCS installation site. In addition, as can be seen at interval (c), there was an overall tendency for the participants to play the memory game more often at night, increasing their activity level at the same time. In contrast, interval (a) shows a larger amount of activity compared to interval (b), but almost no gameplay history except for Participant E. This result indicates that the participants were preparing to go to school during the corresponding period.  At the same time, we can also see significant differences between the participants' behavior on weekday vs. weekends in terms of their life rhythms, such as sleeping hours and daily habits. Figure 5 shows the weekend activity patterns for Participants A and G, which were particularly distinctive. These patterns allow us to estimate the participants' awakening time and bedtime. For example, Participant A's activity decreased early Sunday morning ((d) in Figure 5) and increased around 15:00 ((e) in Figure 5). In the interview after the experiment, Participant A confirmed that this activity pattern corresponded to his sleep pattern. Similarly, the www.astesj.com 16 other participants' sleep patterns could also be estimated from their activity logs. These results demonstrate the usefulness of the users' stored behavioral log data. On the other hand, it was not possible to fully estimate Participant G's weekend sleep pattern. Figure 5 suggests that Participant G's bedtime was 3:00 on Saturday, and her waking time was around 19:00. However, in the interview after the experiment, we learned that she was awake but not in the room with the sensor at certain times of the day, such as from 7:30 to 17:00 (interval (f) in Figure 5) on Saturday and from 8:00 to around 18:00 (interval (g) in Figure 5) on Sunday. The fact that Participant G spent less time in the room where the CPCS was installed may have contributed to her lower amount of activity.
These results demonstrate that the use of sensor logs and gameplay histories obtained by the CPCS allows us to estimate not only daily habits such as waking and sleeping times but also whether users go out for long periods of time.

Using Sensor Logs to Capture Users' Lifestyle Habits
This section focuses on the sensor logs obtained in the IDA, a component of the CPCS, while playing the memory game on the MCT, and before and after playing. Figure 6 shows the amount of activity recorded every one minute for about 20 min while Participant B was playing the memory game. The memory game has 13 levels of difficulty and records three degrees of fatigue: 0, Bad; 1, Normal; 2, Good. Figure 7 shows the transitions in difficulty level and degrees of fatigue during the time that Participant B played the memory game. Note that all the intervals in Figure 7 correspond to interval (h) in Figure 6. The red circles on the red line plot in Figure  7 indicate the times at which the memory game was started. In Participant B's interview after the experimental period, she noted that she played the memory game at her desk (near the location where the CPCS was installed). During and after the gameplay (interval (h) and the second half of (h) in Figure 6), the amount of activity increased compared to the period of time before play, while after 22:12 (interval (i) in Figure 7), the participant temporarily stopped playing the memory game. Figure 6 confirms that the amount of activity remains high. We can speculate that the cause of this pattern was that Participant B was feeling fatigued from the memory game and may have stopped playing the game to do some other task or take a break. Figure 8 depicts the amount of activity recorded every one min for about 20 min while Participant E was playing the memory game. In contrast to the results obtained for Participant B, Participant E's activity tended to decrease during the game-playing time ((j) in Figure 8) compared to his activity before and after playing. The interviews revealed that Participant E tended to play memory games on the bed. We inferred that this position was responsible for the low activity levels recorded by the system, since the location of the sensor would have made it difficult to detect someone on a bed. The interviews of the other participants indicated that the sensor logs adequately expressed their behaviors. It was clear from these results that the data obtained from the IDA and the MCT can be used to extrapolate detailed information about the user's circumstances.

Questionnaire Investigation of the CPCS
Next we will discuss the participants' impressions of the CPCS and the synergistic effects of combining the IDA and the MCT based on the results of subjective evaluations of the nine participants (college students) conducted after the experimental period. Among the nine participants, five stated that they felt there was a need to use the CPCS; however, there was only one participant who wanted to use both the IDA and the MCT as individual components of the CPCS. Thus, although in some cases the student participants expressed interest in the components of the CPCS, it can be said that, on the whole, they did not have a positive impression of the prototype version of the CPCS. One reason for this may be that students attend classes on a daily basis and have frequent opportunities to talk with the people around them. Further research needs to be conducted to investigate differences in the trends across age groups. Table 5 shows the average number of times the memory game provided by the MCT was played per day by each participant with www.astesj.com Table 5: Differences in the number of times the memory game was played with and without speech from the IDA. and without speech from the IDA. When the IDA spoke, the number of times each participant played was greater than that when the IDA did not speak (average: 8.1 times). We consider that the above result was caused by the synergy of the combination of the IDA and the MCT, because the dialogue between the IDA and the participant promoted their habitual game playing. The other questionnaire results can be seen in Appendix (C).

Conclusion
In the present study, we introduced the Speech Content Coordinating Function (SCCF) into the Intelligent Dialogue Agent (IDA), and developed a prototype version of the Comprehensive Preventive Care System (CPCS) consisting of the IDA and the Mechanism for Cognitive Training (MCT). The SCCF was developed as a textbased agent in the form of a chat-bot that employed reinforcement learning based on the policy gradient method. This system was tested in an experiment with student participants. The experimental results confirmed that the SCCF acquires appropriate policies based on the user's preferences and conditions. Furthermore, the results of experiments using the prototype version of the CPCS confirmed that taking advantage of data obtained from both the IDA and the MCT can lead to synergistic effects to monitor users in greater detail, better understand their circumstances, and increase the frequency of cognitive training.
In future research, we will further improve the convenience and familiarity of the SCCF by determining not only the content of speech but also the appropriate frequency and timing of speech based on the user's circumstances at any given time. At the same time, because this improvement may cause an exponential increase in the number of states and actions compared to the learning environment of the experiments conducted in the present study, we plan to improve the current learning algorithm or apply more powerful ones, such as deep reinforcement learning algorithms. The adaptability and familiarity of the improved SCCF will need to be evaluated through more practical experiments with older adults in a setting that more closely replicates real-life circumstances.
We also expect that further synergistic effects will occur by allowing the components of the prototype CPCS to communicate and share their data. It will also be necessary investigate the effects of the CPCS on older adults and their impressions of this system by fully examining the different trends in people of different age groups. We also plan to integrate a fall prevention system, which is currently being developed in parallel by our research group, into the CPCS; however, since this system involves physical exercise and a certain risk of injury, we will proceed with its integration in stages, starting with older adults who are at little risk of injury. Similarly, we will continue to work on the introduction of other preventive care systems and expect to see synergies similar to those obtained in the present study.    (1) Comparing the first and third of these 3 days, did you feel the speech contents from the chat-bot changed?
(2) The IDA's setting to learn your preferences for its speech content was different between the first 3 days and the last 3 days. In which period do you think the IDA provided more interesting speech contents, the first 3 days or the last 3 days?
(3) Could you tell me the reason for your answer to the above question? Impression of the chat-bot's speech content (4) Which of the following best describes your impression of the IDA's speech content on this particular day?
Participants' interest in the chat-bot's speech content (5) Could you rank the topics provided by the chat-bot in terms of how much they interested you? (6) Could you rate each topic provided by the chat-bot during the experiment on a 5-point scale?
Participants' interest in the topics in daily life (7) Could you rank the topics provided by the chat-bot in order according to how much they usually interest you? (8) Could you rate the topics provided by the chat-bot on a 5-point scale based on how much they interest you in your daily life? Table B2: Older adults' impression of the chat-bot at each reward setting (answers to questions (1) and (2) in Table B1, 3-point scale, with 3 being best).
Item / Reward setting R A R B Changes in speech content from chat-bot 2.3 1.9 Impression of the chat-bot's speech content 2.4 2.3 Table B3: Which reward settings R A or R B was preferred by older adults (answers to question (6) in Table B1 The parts of Tables C7 and C8 expressed as red-colored text are the questions and the answers discussed in Section 5.2.4.   (1) in Table C1.

Equipment
The number of people with the equipment. Display that can be connected with HDMI 9 HDMI cable 9 Wired/Wireless keyboard. 8 Wired mouse. 7 Table C3: Awareness of smart speakers. The number of respondents by choice for each question ((6), (7), (8) in Table C1).

Question Answer
The number of respondents (6) I know about them. 5 I only know the name. 5 I don't know about them. 0 (7) Yes. 2 No. 8 (8) I want to use one. 6 I'm not sure. 2 I don't want to use one. 0 I have one and I want to keep using it. 2 I have one, but I don't want to keep using it. 0 Table C4: Items in the questionnaire administered during the preliminary experiment Item Question Frequency of listening (1) Could you tell me the average number of times per day to the IDA's speech you listened to the speech from the IDA? Impression of (2) Which of the following best describes your impression of the the spontaneous spontaneous talk from the IDA? talking function (3) What is your impression of the IDA's spontaneous talk? (4) Could you tell me the reason you chose that item as the answer to the above question? (5) Could you tell me the contents that you want the IDA to talk about? (if any) Willingness to keep (6) Do you want to keep using the cognitive training mechanism using the CPCS and the IDA? Impression of (7) Which of the following best describes your impression the CPCS of the prototype of the CPCS (the cognitive training mechanism and the IDA)? (8) Could you tell me the reason for your answer?