Integrating Speech and Gesture for Generating Reliable Robotic Task Configuration
Volume 9, Issue 4, Page No 51-59, 2024
Author’s Name: Shuvo Kumar Paul, Mircea Nicolescu, Monica Nicolescu
View Affiliations
Department of Computer Science and Engineering, University of Nevada, Reno, 89557, USA
a)whom correspondence should be addressed. E-mail: shuvokumarp@unr.edu
Adv. Sci. Technol. Eng. Syst. J. 9(4), 51-59(2024); DOI: 10.25046/aj090406
Keywords: Task Configuration, Robotic Task, Gesture Recognition
Export Citations
This paper presents a system that combines speech and pointing gestures along with four distinct hand gestures to precisely identify both the object of interest and parameters for robotic tasks. We utilized skeleton landmarks to detect pointing gestures and determine their direction, while a pre-trained model, trained on 21 hand landmarks from 2D images, was employed to interpret hand gestures. Furthermore, a dedicated model was trained to extract task information from verbal instructions. The framework integrates task parameters derived from verbal instructions with inferred gestures to detect and identify objects of interest (OOI) in the scene, essential for creating accurate final task configurations.
Received: 24 April 2024, Revised: 17 July 2024, Accepted: 25 July 2024, Published Online: 02 August 2024
- Y.-L. Kuo, B. Katz, A. Barbu, “Deep Compositional Robotic Planners That Follow Natural Language Commands,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 4906–4912, IEEE, 2020, doi:10.1109/ICRA40945.2020.9197464.
- T. Kollar, S. Tellex, D. Roy, N. Roy, “Toward Understanding Natural Language Directions,” in 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 259–266, IEEE, 2010, doi:10.1109/HRI.2010.5453186.
- C. Matuszek, D. Fox, K. Koscher, “Following Directions Using Sta- tistical Machine Translation,” in 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 251–258, IEEE, 2010, doi:10.1109/HRI.2010.5453189.
- R. Cantrell, K. Talamadupula, P. Schermerhorn, J. Benton, S. Kambhampati,
M. Scheutz, “Tell Me When and Why to Do It! Run-Time Planner Model Updates via Natural Language Instruction,” in Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, 471–478, 2012, doi:10.1145/2157689.2157840. - M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, M. Bugajska,
D. Brock, “Spatial Language for Human-Robot Dialogs,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 34(2), 154–167, 2004, doi:10.1109/TSMCC.2004.826273. - S. Tellex, T. Kollar, S. Dickerson, M. Walter, A. Banerjee, S. Teller, N. Roy, “Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation,” in Proceedings of the AAAI Conference on Artificial Intelligence, volume 25, 2011, doi:10.1609/aaai.v25i1.7979.
- N. Nguyen-Duc-Thanh, S. Lee, D. Kim, “Two-stage hidden markov model in gesture recognition for human robot interaction,” International Journal of Advanced Robotic Systems, 9(2), 39, 2012, doi:10.5772/50204.
- S. Iengo, S. Rossi, M. Staffa, A. Finzi, “Continuous gesture recognition for flexible human-robot interaction,” in 2014 IEEE International Con- ference on Robotics and Automation (ICRA), 4863–4868, IEEE, 2014, doi:10.1109/ICRA.2014.6907571.
- G. H. Lim, E. Pedrosa, F. Amaral, N. Lau, A. Pereira, P. Dias, J. L. Azevedo, B. Cunha, L. P. Reis, “Rich and robust human-robot interaction on gesture recognition for assembly tasks,” in 2017 IEEE International conference on autonomous robot systems and competitions (ICARSC), 159–164, IEEE, 2017, doi:10.1109/ICARSC.2017.7964069.
- P. Neto, M. Sima˜o, N. Mendes, M. Safeea, “Gesture-based human-robot inter- action for human assistance in manufacturing,” The International Journal of Ad- vanced Manufacturing Technology, 101, 119–135, 2019, doi:10.1007/s00170- 018-2788-x.
- Q. Gao, J. Liu, Z. Ju, Y. Li, T. Zhang, L. Zhang, “Static hand gesture recognition with parallel CNNs for space human-robot interaction,” in In- telligent Robotics and Applications: 10th International Conference, ICIRA 2017, Wuhan, China, August 16–18, 2017, Proceedings, Part I 10, 462–473, Springer, 2017, doi:10.1007/978-3-319-65289-4 44.
- F. H. Previc, “The Neuropsychology of 3-D Space.” Psychological Bulletin, 124(2), 123, 1998.
- H.-S. Fang, S. Xie, Y.-W. Tai, C. Lu, “RMPE: Regional Multi-person Pose Estimation,” in ICCV, 2017.
- C.-B. Park, S.-W. Lee, “Real-Time 3D Pointing Gesture Recognition for Mobile Robots With Cascade HMM and Particle Filter,” Image and Vision Computing, 29(1), 51–63, 2011, doi:10.1016/j.imavis.2010.08.006.
- Google, “Google/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.” https://github.com/google/mediapipe, ac- cessed: 2022-03-13.
- “Hand landmarks,” https://developers.google.com/static/ mediapipe/images/solutions/hand-landmarks.png, accessed: 2023-12-12.
No. of Downloads Per Month