Human-Robot Multilingual Verbal Communication – The Ontological knowledge and Learning-based Models

Article history: Received: 30 June, 2020 Accepted: 09 July, 2020 Online: 09 August, 2020 In their verbal interactions, humans are often afforded with language barriers and communication problems and disabilities. This problem is even more serious in the fields of education and health care for children with special needs. The use of robotic agents, notably humanoids integrated within human groups, is a very important option to face these limitations. Many scientific research projects attempt to provide solutions to these communication problems by integrating intelligent robotic agents with natural language communication abilities. These agents will thus be able to help children suffering from verbal communication disorders, more particularly in the fields of education and medicine. In addition, the introduction of robotic agents into the child's environment creates stimulating effects for more verbal interaction. Such stimulation may improve their ability to interact with pairs. In this paper, we propose a new approach for the human-robot multilingual verbal interaction based on hybridization of recent and performant approach on translation machine system consisting of neural network model reinforced by a large distributed domain-ontology knowledge database. We have constructed this ontology by crawling a large number of educational web sites providing multi-lingual parallel texts and speeches. Furthermore, we present the design of augmented LSTM neural Network models and their implementation to permit, in learning context, communication between robots and children using multiple natural languages. The model of a general ontology for multilingual verbal communication is produced to describe a set of linguistic and semantic entities, their properties and relationships. This model is used as an ontological knowledge base representing the verbal communication of robots with children.


Introduction
The great evolution of the theories and tools of artificial intelligence in addition to the technological achievements in the field of industrial robotics have contributed to the advent of intelligent robots endowed with physical as well as intellectual capacities. The efficient and rational integration of this kind of intelligent robots in real working environments in the presence of human beings is always a challenge and an important objective for the engineer as well as for the scientist. Several fields and sectors of application are directly concerned by this integration and consider it very promising for greater productivity and efficiency. Among these areas are mainly the industrial sectors and to a lesser but growing extent the education and health sectors. Especially for the education and health sectors, the most important aspect that determines the success or failure of this integration is the capacity for verbal interaction between humans and these intelligent robots. Indeed, verbal exchanges for these two sectors are essential for students' learning activities and for exchanges with patients during medical procedures. Verbal exchanges are even more important in the case of students or patients with special needs and especially in the case of autistic children.

ASTESJ ISSN: 2415-6698
This research work is part of this context and aims to produce generic models representing a solution to the problem of verbal interaction of intelligent robots with several groups of children who use different natural languages. We also provide an implementation and experimentation of these models by considering the case of three natural languages most formally used in the geographical area of North Africa in this case Arabic, French and English. Our models will allow a smart robot to be able, in real time, to detect and communicate in the natural language of the children in front. This will be of great use for teachers or tutors to carry out collaborative learning activities in which the robot will have the role of teacher, tutor or even of student.
The paper is outlined in a way that Section 2 gives a survey on human-robot interaction systems. The section 3 explains our architecture model that illustrate the main three transformation modules the robot must have to interact with humans in multiple natural languages. The three modules that compose the whole system have also been discussed. Section 4 give the details on the first module especially the description of the data structure used, a scraping approach we developed to extract the expected data of parallel text and audio corpus mainly as JSON file format. The end of this section provides the Extract-Transform-Load model describing the mapping operations of data and metadata from the corpus to the ontology database. In section 5, we show how we can improve the performance of the translation machine algorithm based on our ontological knowledge base. Section 6 provides a conclusion on the different aspects of our models and ends with a critical discussion in order to identify new avenues for future research.

Classes of Human-Robot interaction
In a common work environment, an intelligent robot can have two types of interaction with humans [1]: verbal or non-verbal. The first interaction form covers the production of emotions or meaning without use of words or speech but using only facial or body gestures. On the other hand, the second form necessarily involves the use of natural language expressions in the orally and/or written forms.  Figure 1 illustrates the key idea of our work where an intelligent robot is called to allow real-time verbal exchanges between two or more groups not speaking the same language and having to share a common experience of learning or therapy. The robot thus plays the role of mediator and communication interface between the different groups.

Overview of research findings on oral interactions between robotic agents and humans
In the last decade, several high-quality research studies have been conducted on problems related to the oral communication possibilities between robots and humans. This period was marked by a significant number of articles indexed in this area. The study of these references revealed the main guidelines relating to our work and the following three research areas namely: (1) smart robotics and cognitive [2]- [6], (2) knowledge systems and (3) ontological databases [7]- [10] and artificial intelligence and machine learning techniques [11]- [16].
These research works highlight the inability of conventional robotic systems to respond to rapid changes in technical and functional production requirements. They present as a solution to these problems, the introduction of cognitive robots in the production lines. These robots while being able to adapt quickly, they are also able to interact and collaborate with human operators.

Systemic model of verbal communication of robots based on an ontological knowledge database
The diagram in the figure 3 provides an integrated systemic representation of our approach which is built up into three large independent parts, namely: (A) the devoted part for the construction and updating the text and audio parallel corpus links to the expandable list of languages, (B) the second part consisting in decisive phase which constitutes a driving force of construction and extension of our ontological knowledge database by the discovery of the entities and the relationships which connect them as well as their properties [10]. This engine also has the role of extracting all the occurrences associated with the entities found. (C) the third part constitutes the system of language identification, speech recognition and text translation. The objective of this module is the transformation of a given speech Sx, expressed in a language Lx, into an equivalent speech Sy expressed in another language Ly. This transformation is necessary to guarantee the exchange in a multi-language context. We have decomposed this transformation into three successive operations: (1) "Speech2Text": the transformation of the speech Sx into an equivalent text Tx and expressed in the same language Lx.
(2) the transformation of the text Tx into an equivalent text Ty expressed in the language Ly and, (3) the transformation of the text Ty into an equivalent speech Sy expressed in Ly. Figure 4 describes the sequence of the three operations and shows their relationship to the ontological knowledge management system. These operations are all based on the use of machine learning algorithms trained and validated by a high-quality dataset of parallel speeches and texts for the Arabic, French and English languages. The knowledge system also serves as a storage base [17] for all conversations collected or generated by the system. Figure 5 describes the most complicated transformation in our system. It is carried out in five phases: firstly phase (1) dedicated to the pre-processing of the input speech in order to eliminate possible crudes by the application of filters and to segment it into pieces thanks to the application of techniques detection of breaks, framing and windowing. A second phase (2) consists of the application of an algorithm for identifying the language of speech. The results of these two phases are used by phase (3) which transforms each segment into a sequence of words thanks to the use of a neural network of the "multi-layer perceptron" type. semantic and contextual integration represents phase (4) and makes it possible by analysis and discovery of the semantic relationships provided by the ontological knowledge base. this completeness of the generated text also allows phase (5) to associate a set of tags with the speech that has just been processed.

Language identification algorithm
The identification of the language of a speech is often done by selecting and analyzing a piece of this speech as reported in [8] and [18]. As reported in [11]- [13] and [19], this identification may be done by a classification algorithm based on the automatic learning of a "deep neural network" trained using a multi-language speeches dataset recorded for several speakers. For its simplicity and the precision of its results, we chose the use of a classification algorithm based on the learning of a recurrent neural network (RNN) [20]. This learning is done on a large number of features generated by an extraction function based on the knowledge base.  To implement and test the models proposed in figures 5 and 6, we chose to create components programmed in Python and this for the range of these standard libraries in the fields of numerical calculation, machine learning and the field of processing of natural languages [21]. To define the corpus of speeches and parallel texts for the three languages considered, we identified a large list of websites offering free multilingual resources such as multilingual newspapers, online and multilingual tv channels, podcasts in lines in addition to forums and tweets of thematic conversations.

Text to Text Translation
Several works on the problem of automatic translation machine have been published. The most recent have adopted machine learning based on LSTM neural networks model with variants obtained by adding additional layers in order to improve the translation accuracy. Among recent approaches, we can notably cite "tree LSTM based methods with attention [14] and LSTM based methods with transformers [15]- [16]. As depicted in figure  8, our approach is based on the latest model whose learning is improved by the addition of a "labeling" module and reduction of the training dataset thanks to our ontological knowledge base.
The figure 7 describes the translation machine process in two steps and their interaction with the ontological knowledge database through the ontological engine. Thus, the input sequence message expressed in language Li is first encoded based on Long Short-Term Memory (LSTM) networks augmented algorithm [22] in a common representation that is independent to the source and the target languages. The second step performs the decoding in the target language according the second step of LSTM algorithm.

Ontology capabilities and usage
The formal modeling of knowledge on a field of activity and the representation of dependency relationships between its elements has always been a means expected by both engineers and scientists [23]. This modeling makes it possible to structure knowledge to serve as a knowledge base allowing to carry out new analytical studies and validate theories [10]. Among the models widely adopted by researchers stand out the ontological models. moreover, ontological models have a great capacity to represent semantic and contextual aspects relating to a domain [24]- [25]. Ontologies have an extensive use in Web applications where they are been considered as source of semantics with a rich and formal representations.
Thanks to these representations a large possibility in terms of reasoning mechanisms and, manipulation of data and metadata are possible [4]. For all these advantages, the ontological models are adopted in the fields of cognitive robotics and that of artificial intelligence. These possibilities also allow the management of knowledge relating to collaboration between robots and humans through verbal communication [7]. In this research work, we exploit all these possibilities through a domain ontology, as illustrated in figure 8, to serve as a model of a knowledge base for Human-Robot interaction system. This model should allow domain hierarchization into interest sub-domains (education, health, specific education, therapy, ...) [26]. Our model also provides the linguistic entities for natural language processing operations we have defined in section 3. We adopted OWL language to create and use our ontological knowledge database. Figure 8 corresponds to the ontological model we designed. It represents a general ontology that represents several fields of knowledge, such as education, health and rehabilitation as examples. In connection with these domains, our ontological model also models the set of conversations that may occur between human and/or robotic agents. the detailed representation of the linguistic components of the conversations is also given by this model. the model represents a conversation as a set of textual or oral sequences. The ontology expresses the relationships between all the entities and their main properties.

Ontology use for tagging sequences and reducing dataset
The ontology represented at figure 8 is used as a knowledge management system for managing information related to the human-robot verbal interactions and for help in capturing important multilingual interactions' features for a specific domain or sub-domain [9]. Indeed, building on a hierarchy of domains and sub domains of interactions and their links to a set of linguistic and semantic concepts, our ontology knowledge base allows extraction of relevant, reduced and contextual datasets that facilitate learning significantly. It also allows a relevant labeling of these datasets producing semantic learning attributes. The figure 9 shows the principle followed to favor the reduction of training datasets taking into account the hierarchy of domains. The exploitation of the domain hierarchy provided by our ontology allows to generate reduced and relevant dataset (table 1) according to the considered domain (education, therapy, ...) Furthermore, an ontology-based feature optimization method is used to reduce significantly dimensionality of feature space [10]. Thus, we are able to extract dimension-reduced datasets of multilingual parallel texts for the training and validating the LSTM-based learning model. This method is conducted in two steps: • As a first one, concepts of general ontology are generated by a transformation of the terms of vector space model, then the frequency weights of these concepts are calculated by the frequency weights of the terms. • and according to the structure of the general ontology, similarity weights are associated then with the concept features.

The used dataset
The results published in this paper are obtained on a parallel dataset constructed for Arabic, French and English ( Table 2). The choice of these languages reflects our objective of deploying the results in the Moroccan context where these three languages are dominant. The table groups together the main data characterizing the distribution of sentences in the dataset used. Figure 10 provides a graphical representation of these results. This graph shows a very strong similarity of these results for the three languages considered.

Model training
The figure 11 shows the learning result of our LSTM neural network model. It shows in particular the list of layers created and the order in which they are connected with an LSTM layer and at the end the Dense layer which comes from the LSTM layer and which is responsible for producing the prediction of the outputs. The reshape functions have been used for successive dataset transformation from 2D to 3D format.

Conclusion and future works
In this paper, we presented several models and algorithms for a new approach of the human robot multilingual verbal interaction. The strength of our method lies in the combination of the strength of ontological models in terms of semantic representation and their possibilities of reasoning and inference to the capacity of machine learning models based on the LSTM model. We perform the implementation of these models using the machine learning package named Keras and NLTK python framework. These implementations have been tested in a multi-agent environment made up of several robot and human agents which interact verbally according to a number of conversations in the three languages.
The development of new upgrades for our system will be the focus of our future research, we are also aiming to take action integrating human-robot interactions in the fields of therapy and special education with the purpose of accumulating data in these fields as well as emotional data, that would allow our robots to mimic human emotion expressed in facial expressions, choice of words and voice intonation as well as detect the emotional state of patients and act accordingly. In addition, we will consider implementing the proposed models in real situations of interaction between NAO robots and children, especially in the case of the two following areas: Therapeutic rehabilitation of children with specific needs and the educational system for this category of children.