Recognition and Position Estimation for Multiple Labware Transportation Using Kinect V2 and Mobile Robots

Mobile robots can be used to perform transportation tasks for different objects. These tasks have to be implemented carefully. Therefore, an accurate approach for object recognition and position estimation is required. This work presents a concept for identification and position estimation of multiple labware. These labware, which contain chemical and biological components, have to be manipulated and transported in life science laboratories using H20 mobile robots. The H20 robot has dual 6-DOF arms with 2-DOF grippers. Different marks are used to be attached with the labware lid for identification process. The Kinect sensor V2 is used to recognize and localize the mark of the required labware on a wide workstation. The difference of performance between the Kinect V1 and V2 is illustrated. SURF algorithm (Speeded-Up Robust Features) is used to recognize the target according to its local features. Some preprocessing steps are applied to the RGB frame to enhance the image features. The effects of strong lighting condition are eliminated by using polarization and intensity filters which are attached to the Kinect camera. The position estimation step is performed by applying a mapping process form the color frame to the depth frame of Kinect. The communication procedure between the Kinect platform and other robot platforms is done using client-server model. An efficient performance with high success rate is obtained under different lighting conditions.


Introduction
This paper is an extension of work originally presented in the international symposium on computational intelligence and informatics (CINTI 2016) [1]. This work shows an approach to identify required labware for mobile robot transportation in life science laboratories. In general, the realization of objects transportation using mobile robots leads to an increase of the productivity and saving human resources in the working environment. This requires several prerequisites like object recognition with position estimation, arm control, and a robot navigation system. The navigation system includes the path planning, mapping, and robot localization in the working environment. Related to the object manipulation, the robotic arm has to be guided to the target pose. The object pose can be acquired visually using a suitable sensor with a proper recognition algorithm. The Kinect V2 sensor fixed on the H20 robot is used in this work to identify and localize multiple labware. The H20 robot is a wireless networked autonomous humanoid mobile robot. It has a PC tablet, dual arms, and an indoor GPS navigation system. A Kinect holder is installed on the H20 body in a way that the labware on the workstation can be visualized clearly. Some technical achievements have been developed at the Center for Life Science Automation (celisca, University of Rostock) to improve the transportation system of the H20 mobile robots [2]- [4]. The H20 mobile robot with different labware and tube racks is shown in Figure.1. The Kinect sensor provides color frame and depth frame of the view. The RGB frame has to be processed to find the required target. This process includes the extraction of meaningful features from the image that lead to the identification of the object. Different target features can be used for this procedure such as color, shape, edges, size, and local features. Some algorithms use ASTESJ ISSN: 2415-6698 multiple features related to the target to create a robust strategy for recognition process. In order to identify the required object using visual sensors, it is necessary to apply different techniques to the captured image. There are several features which can be extracted from the image to find the target. In general, these features are divided into two categories. The first is the appearance features of objects such as the color and intensity. The second is local features of the target itself. The local features have to be extracted and matched with the features in the database related to the object of interest. Image segmentation using color feature is one of the easiest methods to find the required target in the view. This method is suitable for real-time applications because it does not require a detailed prior information about the target. Color detection can be performed using different representations of color system such as RGB, HSV, and YCbCr. The HSV (hue, saturation, value) color system is more robust in dynamic lightness conditions because it is insusceptible to illumination. Sanchez-Lopez et al. used the HSV color system to track the target for service robot applications [5]. The edges detection can also be used to extract the edges of the target. Some filters are commonly used for this purpose such as Canny, Sobel, and Laplace filters. Furthermore, a set of rules of shape primitives can be applied for shape recognition such as circles, triangles, and rectangles. Yamazaki et al. used edges and shape detection with HSV color segmentation to recognize some foods and kitchen tools [6]. The success rate of object recognition can be improved by using multiple features related to the target at the same time.
In order to use specific local textures of the target as an identification reference, feature matching algorithms can be used. SIFT (scale invariant feature transform), SURF (Speeded-Up robust features), and FAST (Features from Accelerated Segment Test) are the most common algorithms for local feature descriptors to perform object recognition [7]- [9]. Theses algorithms are somehow independent to the changes in scale, illumination, and orientation. Collet et al. extracted the local descriptors for the registration and recognition of learned metric 3D models of household objects using a multi-view system [10]. Marchand et al. used the features like corners and contours of known objects to perform a visual tracking method [11]. The speeded-up robust features (SURF) algorithm can be considered as an efficient object recognition method with a fast scale-and rotation-invariant detector and descriptor. The object tracker decreases the required time by providing the full region which is occupied by the target in the image at every frame. The target region is jointly estimated by iteratively updating the region information and location obtained from the previous frame. The region of interest (ROI) which contains the target is obtained using the object position in the image coordinates. In the successive images, only the interest points in this updated ROI are extracted and matched to find the object. Anh et al. proposed an object tracking method based on SURF for safe grasping tasks [12]. Zickler et al. used humanoid robots to achieve detection and localization of multiple objects on the kitchen desk [13]. Katsuki et al. attached specific marks on the required objects to handle various objects in home or office environments using robot system [14]. For the localization process of the object in the real environment, 3D visual sensor like Microsoft Kinect is more preferable. Chung et al. used the Kinect sensor installed on a service robot to help humans in object transportation [15]. Ramisa et al. used the Kinect for cloth grasping by finding the manipulation points in the depth frames [16].
In this work, the approach of recognition and position estimation of different labware is presented. Several concepts and challenges are taken into the consideration to realize an efficient performance. The visual feedback is necessary to perform a high precision manipulation of the labware that contains chemical and biological components. The strong and glossy lighting conditions in addition to the appearance of the labware affect the identification process. Therefore, a specific mark is attached to each labware to be to be distinguished from each other on a wide workstation. Speeded-Up robust features algorithm (SURF) and HSV color segmentation are used to recognize different marks. Some preprocessing steps are applied to the image to enhance its features for the recognition step. The Kinect V2 is used for this work. The difference of performance between the Kinect V1 and V2 is illustrated. Polarization and intensity filters are attached to the Kinect to reduce the effects of strong and glossy light. The connection between the Kinect platform and other robot platforms is established through a client-server model. This paper is organized as follows: in section 2, the problem definition is presented. The proposed methodology is given in section 3. Section 4 shows the labware identification and localization process which will be followed by the system integration. Finally, the results are concluded and discussed.

Problem Definition and Restriction
The future of life sciences laboratories depends significantly on the innovations of automated solutions in the entire scope. Different scientific tasks, like biological testing and sample preparation, are performed in the laboratories by using automation equipments. This leads to realize high throughput, workflow optimization and reliable measurement results. Robots and especially mobile robots are very important in the life science field. Mobile robots increase the productivity and save human resources by connecting all workstations in different laboratories. For labware transportation, an intelligent procedure to grasp the target object and placing it at the right position on the workstation is required. The success of this procedure depends significantly on the success of recognition and position estimation for the required labware. Fig. 2 shows the wide workstation which consists of 8 positions of labware containers. The required labware and its position on the workstation have to be identified and the H20 robot has to change its position to manipulate it. The laboratories and the workstations have strong lighting conditions especially with sunny weather. The glossy ceiling light blinds the visual sensor which affects the identification process since it changes the labware appearance. Also, the workstation can be affected by the sunlight if it is close to a window or is surrounded with objects which reflect some light. According to this situation, a proper visual sensor has to be located in a suitable pose on the robot with a specific height to provide an adequate view for the workbench. Moreover, the view angle of the visual sensor related to the workbench plays a role in the identification process by reducing the light effects. The H20 robot has stereo cameras in the head. The stereo vision system is fixed on a pan-tilt joints module for object tracking. The head cameras are not appropriate for labware identification task because of the limited height and distance between the head and the workstation. The cameras don't provide a clear view for the whole workbench especially at the ends of left and right sides of it. Moreover, several complex steps have to be performed in stereo vision systems to obtain the depth data. The images of stereo cameras have to be processed by steps such as undistortion, rectification, correspondence, and reprojection. It is important to correspond the video quality, resolution, and contrast in the both cameras. The two cameras have to be mounted accurately to guarantee that all their axes are parallel. Also, the kinematic solution of the head joints has to be derived and the target has to be tracked to be seen by the two cameras. To avoid these complex issues, the Kinect sensor can be considered as a proper choice for this work. It can be used to provide a sufficient and clear view for the whole workstation after fixing it in a suitable pose over a holder. Also, the Kinect provides the depth data directly without the need of performing all the complex steps which are previously mentioned for the stereo vision system.

Proposed Methodology
In the Center for Life Science Automation, different automation islands are connected with each other by the cooperation of stationary and mobile robots. This cooperation requires an appropriate hierarchical management systems. The mobile robots are used for maneuvering between the adjacent laboratories for transporting multiple labware and tube racks. The overall workflow starts with the user or with the hierarchical workflow management system (HWMS) to decide which target has to be manipulated and transported [17]. The manipulation system can be splitted into two parts, the target localization and the arm controller. The target localization software with the visual sensor is utilized to detect the target and to estimate its pose. The pose information is sent to the arm controller software to guide the robotic arm. The information exchange between the two parts is performed using a TCP/IP socket through client-server communication model. Fig. 3 shows the block diagram of the manipulation process.
The target pose has to be calculated related to the arm base. Then, an accurate kinematic model has to be used to control the arm joints. The kinematic analysis is how to describe the arm links motion without considering its forces. There are two terms related to the kinematic analysis, forward kinematics (FK) and inverse kinematics (IK). Using the forward kinematics model, the endeffector pose relative to the arm base can be found according to the given joint angles. On the other hand, the inverse kinematics model describes how to find the required joint angles for the given endeffector pose. The analytic solution of IK problem has been found and applied physically on the H20 arms to guide them to the target [18][19] [20]. The Kinect sensor can be considered as an appropriate choice for the labware manipulation approach. The Kinect provides a high quality color and depth information which is directly obtained without applying some complicated steps on the image as in the stereo vision. Both versions V1 and V2 of the Kinect sensor have been used for this work. The Kinect V2 has a wider horizontal and vertical view than V1. The RGB and depth cameras have a higher resolution and the depth measurements are more accurate. Also, the Kinect V2 uses the time-of-flight principle, whereas the Kinect V1 is based on structured light to provide the depth data [21]. Fig.  4 shows the two versions of Kinect and Table 1 shows the differences between them.  A holder has to be installed on the H20 body for fixing the Kinect sensor. The height and tilt angle of the holder have to be appropriate to realize a wide and clear view for all the labwares on the workstation. The distance between the Kinect and the workstation has to be suitable to obtain the target position where the minimum depth value of Kinect V2 is 50cm. Also, the FOV of the RGB camera and depth camera are not compatible. This leads to the fact that not every point in the RGB frame has a related 3D position value since the FOV of the RGB camera is wider than the FOV of the depth camera. The Kinect holder should not obstruct the head movement. In addition, the FOV of stargazer, which is located behind the H20 head, has not to be affected by the holder or the Kinect (see Fig. 5). The stargazer module is used to detect passive ceiling landmarks for mobile robot localization [22]. Moreover, any kind of changes in the Kinect position and orientation has to be avoided during the robot movements. The Kinect sensor has been supplied with the required power using a 12V battery with stabilizer. Furthermore, the interfacing between Kinect V2 and H20 robot is set through a USB-3 port of the H20 tablet. SURF algorithm for local features description and recognition is used to identify multiple labware and tube racks. This can be performed by extracting local features from the reference image to be identified with the current image. C-sharp programming language has been used to develop the manipulation system of the H20 robots. Fig. 6 shows the structure of the labware manipulation system.  Arm gripper design plays an important role in the realization of secure manipulation tasks. Several models of arm grippers and labware containers have been designed for arm manipulation [18]- [20]. Fig. 7 shows the final 3D model of the gripper with container. The goal of this design is to handle heavy labware by decreasing the lever arm of the wrist joint. This brings the labware weight center closer to the wrist. In this case, less torque is required from the wrist joint for lifting the labware. The maximum payload which can be handled with this design is 700g. To implement the visual manipulation, the labware itself has to be recognized and localized. Different object recognition algorithms can be used such as SIFT, SURF, and FAST, which are widely used for object recognition applications. The next section illustrates the process of labware recognition and position estimation using SURF algorithm with Kinect V2 sensor.

Labware Identification and Localization
Since the H20 robots deal with different kinds of labware, a reliable technique is required to recognize them. Different tests have been performed to show the performance of identification and localization for labwares at different positions on the workstation and under different lighting conditions. The identification tests have been performed using Microsoft Visual Studio 2015 with C# programming language. SURF algorithm is the most appropriate method that can be implemented using this programming language. The project is running on a Windows 10 platform in the H20 tablet. The process starts with an offline step by capturing an image of the target to be saved in the database as a matching reference. Strong and glossy light as well as sunlight may affect the recognition process. Therefore, to start with the online process and to decrease the light effects, some pre-processing steps have to be applied on the live image to ensure a better performance with SURF algorithm. Different procedures have been applied such as contrast with brightness correction, histogram equalization, HSV conversion, grayscale conversion, etc. The average required time for recognition process is about 3 seconds, which is long according to the application perspective. This long time is related to the features extraction and identification from a high resolution image of Kinect (1920X1080). To cope with this issue, the region of interest (ROI) can be extracted from the image using a cropping technique. The cropping process for the area of workstation with labwares is performed in the Y-axis of the image only as shown in Fig. 8. The cropping area can be estimated according to the error range of the distance between the robot and the workbench which is about ±3cm. The resolution of the image after the cropping step is (1920X400). The recognition process requires about (1.1 ̴ 1.5 sec.) using the cropped image. Thus, the required time has been decreased to the half after using the cropping image.

Labware Identification Process
The recognition procedure has been performed for some tube racks without lid. In general, the lids are entirely either white or transparent. These lids are necessary to protect the tubes components from cross contamination and evaporation. Fig. 9 shows the 6-tubes rack which has different top views at different positions on the workstation. The variation in the top view are related to the 3D features of the tubes and also related to the angle of Kinect view. The lighting conditions play an effective role in the appearance of tubes rack at different positions. The 6-tubes rack has been placed at 8 different positions on the workstation for the recognition process. The 8 positions are ordered from left to right in the sequence of P1-to-P8. Fig. 10 shows the recognition outcome of the 6-tubes rack by drawing a polygon around it with cross to specify the center point.   . 11 shows the success rate of the recognition process which has been tested 20 times at each position using different preprocessing steps and under different lighting conditions. The first test has been done without turning on the glossy ceiling light. The image acquired from Kinect has been used directly without applying any preprocessing steps. The identification results of the first test at each position are titled with (N) in Fig. 11. The success rate at P4 is 100% (20/20) since the top view of the tubes rack is very clear at this position. At P3, the success rate is 65% (13/20) while it is zero at the rest positions. The overall success rate for the first test is ̴ 20% (33/160). This rate is related to the 3D features of the tubes top view which gives different appearance at each position according to the angle of Kinect view. The second test has been done after turning on the ceiling glossy light. The overall success rate of the second test (N_L) is ̴ 19% (31/160) as shown in Fig. 11. The third and fourth tests have been performed after applying the brightness and contrast corrections for the Kinect image. The overall success rates are ̴ 9% and ̴ 22% respectively. The fifth and sixth tests have been performed after converting the Kinect image to grayscale and applying the histogram equalization. The overall success rates are 20% and ̴ 17% respectively. Figure 11. Success rate of 6-tubes rack identification.
The best overall success rate ( ̴ 22%) has been obtained after applying the brightness with contrast correction and in lighted environment. The false positive rate for this case is about ̴ 13% (21/160). The false positive condition causes identifying a wrong target in the view which has to be avoided because it affects the manipulation process. As a conclusion from the overall results, it can be realized that the labware without lid has to be located at a fixed position on the workstation to be identified. This can be noticed clearly according to the overall success rate at P4 which is about ̴ 92% (110/120). Table 2 shows the success rates summary of the identification tests for the 6-tubes rack where the term O.S.R.T means the overall success rate for each test and the term O.S.R.P means the overall success rate at each position.  Specific marks have been attached to each labware lid to improve the identification process at each position on the workbench. Fig. 12 shows different lid marks which can be used to distinguish different labwares. The mark contains an information like labware name and its type with specified number. The number is used for differentiation and classification purposes. This information with the background picture gives sufficient features to differentiate multiple labware. The more features the mark has, the higher success rate of identification will be realized. Also, it is necessary to select the suitable colors of the background mark which reduce the effects of strong lighting conditions. It is known that the dark colors are much preferable to avoid the light reflection. The Kinect V2 has been used to recognize the required tube rack (6 microwaves tubes) according to the related mark with number 8 as shown in Fig. 13. The marks have been printed using coarse paper since it is more robust against light reflection.  The identification process has been performed for the lid mark at different positions on the workbench and under different lighting conditions. The test has been repeated 20 times at each position. Different preprocessing steps have been used before applying SURF algorithm on the Kinect image. Fig. 14 shows the recognition success rate of the mark with number 8 at each position. The improvement in the success rate results from the 2D property of mark which is not easily to be influenced by the angle of Kinect view. Also, the adequate features in the mark help to find the required one easily. In comparison with Fig. 11, it is clear that the success rate has been improved at the positions that are located at the horizontal ends of image like P1, P7, and P8. This is related to the reflected light from the mark and to the angle of view at these positions. It can be noticed also that the brightness with contrast correction improves the recognition process that is slightly affected under strong lighting conditions. The overall success rates for this case at all the 8 positions without and with glossy ceiling light are ̴ 98% and ̴ 97% respectively. On the other hand, the grayscale conversion with histogram equalization reduced the success rate of mark recognition. Furthermore, the valuable improvement that has been obtained is zero false positive rate. Tables 3 shows the success rates summary of the identification tests for the lid mark. Figure 14. Success rate of lid mark identification. The realization of 100% success rate is the goal which has to be achieved especially under strong lighting conditions. The most appropriate solution to deal with lighting effects is to change the camera exposure time. Decreasing the exposure time leads to a reduction of the light effects on the image acquired from the camera. But Microsoft Company has blocked the camera setting of Kinect V2 which prevents the possibility of changing the exposure time or any other camera settings. Therefore, polarization and intensity filters have been affixed on the Kinect V2 camera to decrease the lighting effects as shown in Fig. 15. The intensity filters are used to decrease the brightness in the image and sharpen its edges and features. On the other hand, the polarization filters are used to increase the color saturation and decrease the reflections from glass, metals or other shiny surfaces. The success rate of mark identification under strong and glossy lighting condition is 100% at the all 8 positions on the workstation. Table  4 shows the success rates summary of the identification tests for the lid mark using filters.  The recognition process requires between 1-1.5 second to be performed after cropping the ROI area. To decrease this time, the unwanted and unimportant features have to be removed from the cropped image. For this purpose, lid marks have been designed with red color features as shown in Fig. 16. Color filters can be used to extract the required color from the image after removing all the unwanted colors. This has been performed by using the HSV color filtering as a preprocessing step for the cropped image. The process of red mark identification starts with applying brightness correction with histogram equalization to increases the color saturation. Then, the color system is converted from RGB to HSV because it is more robust against strong lighting conditions. The next step is to extract the red color by removing all the unwanted colors in the image using color segmentation. The final image is converted to a binary image (black and white) to be prepared for the SURF process. Fig. 16 shows the result of this method which requires about 0.5 sec. The best success rate for this method is ̴ 96% at all 8 positions on the workstation with using the filters in the lighted environment. Table 5 shows the results summary of this test.   Table II summarizes the best results of the overall tests after applying the brightness and contrast correction on the raw image and under strong lighting condition. The using of marks for labware identification and manipulation is very useful. These marks can be recognized even when they are partially occluded by some object as shown in Fig. 17. This can be considered as one of the advantages of using this approach. The mobile robot can still find and grasp the required labware even if the related mark is partially seen by the Kinect.

Holder Identification
The required holder has also to be identified and localized for performing the placing tasks visually. To realize that, placing mark has been designed and fixed in front of each holder as shown in Fig. 18. Each placing mark has a number and it is used for recognition and position estimation of the required holder to guide the robotic arm to the right place. In case of existing labware on the workbench, these labwares do not block the placing marks to be seen by the Kinect sensor.

Localization Process
Since the Kinect sensor provides the depth data directly, it is simple to find the position of any point in the view. For labware or holder localization, the related mark is recognized and its center point is found in the image coordinate. The center point can be determined by finding the centroid of the bounding box drawn around the object using the box corners' positions. To find the position of the center point related to the Kinect, mapping steps are required. The first mapping step for the related point is performed from the image frame to the depth frame. This step is essential because there is a mismatch in the resolution between the depth frame and image frame. The second mapping step for the center point is performed from depth frame to the Kinect space coordinates. The center point position of the mark related to the Kinect is used as reference to locate the position of the manipulating point where the robotic arm has to reach. At the end, the position information has to be transformed to be related to the arm shoulder space. Then, the kinematic model is used to control the arm joints and guide the arm to the target.

System Integration
The integration of multiple control software, which are developed using similar or different programing language, into a single one can be complicated. To cope with this issue, an interface can be developed to interact with all coding platforms simultaneously. It is useful to separate the system into multiple platforms to simplify the coding tasks and to facilitate finding the coding bugs. These platforms can communicate with each other in the system using client-server model. The labware recognition and localization part has been integrated and communicates with the multifloor navigation platform [22] and the arm manipulation platform. Using this communication model, these parts can exchange the orders and information between each other. The communication process between the client and the server is described using a sequential diagram shown in Fig. 19. Initially, sockets are created on the server and the client. The client requests a socket connection from the server using a specific port number and IP address. If the requested port is free to use, the server establishes a connection to communicate with the client. Once the connection is established, the client program sends/receives information to/from the server program. Both sockets are closed when the data transfer is successful. The information is exchanged between the client and the server in the form of strings. Figure 19. Diagram of client-server model. In the described system, the multifloor navigation platform sends the work information to the arm manipulation platform whenever the robot reaches the workstation. This information includes the required task (grasping/placing) with the required number of labware or holder. This information is sent to the identification and localization platform (Kinect control). The Kinect platform finds the position of the target and transfers this information to the arm manipulation platform to perform the required task [18]. The process flowchart and the GUI of the identification and localization system are shown in Fig. 20 and Fig.  21 respectively. In the GUI window it can be clearly noticed how the required labware mark has been recognized after cropping the image.The upper part of the GUI represents the server socket which shows the used IP address with the port number. It shows also the received command related to the required target which has been sent from the arm manipulation platform. The bottom of the GUI shows the position information of the target related to the Kinect with the recognition time in milliseconds.

Conclusion
Visual based manipulation plays an important role in mobile robot transportation systems. The ability to identify and localize the required object visually is very essential to guarantee successful tasks. In this paper, an approach for multiple labware and holders' identification and localization for manipulation in life science laboratories has been presented. A suitable design of grippers and labware containers is very important to perform a secure transportation for the required labware. The Kinect sensor V2 with SURF algorithm is used for this work. The Kinect is fixed on the mobile robot using a holder which has been designed carefully. Specific marks have been designed for labware and holder identification. The marks improve the success rate of the identification process for the labware and holders at any position on the workstation. The image acquired from the Kinect has been processed initially to enhance its features and to decrease the execution time. This step improves the identification success rate using SURF algorithm. HSV color filtering has been used with red marks to decrease the identification time. Polarization and intensity filters are used with the Kinect V2 to reduce the glossy lighting effects in the working environment. Kinect V2 provides high resolution image, wide FOV, and accurate position data directly that makes it very desirable for such tasks. The position of the mark related to the robot guides the arm to perform the required task. The client server model has been used to integrate and connect the identification and localization part with the arm manipulation and Multifloor transportation systems.