Integrating Speech and Gesture for Generating Reliable Robotic Task Configuration

Integrating Speech and Gesture for Generating Reliable Robotic Task Configuration

Volume 9, Issue 4, Page No 51-59, 2024

Author’s Name: Shuvo Kumar Paul, Mircea Nicolescu, Monica Nicolescu

View Affiliations

Department of Computer Science and Engineering, University of Nevada, Reno, 89557, USA

a)whom correspondence should be addressed. E-mail: shuvokumarp@unr.edu

Adv. Sci. Technol. Eng. Syst. J. 9(4), 51-59(2024); a  DOI: 10.25046/aj090406

Keywords: Task Configuration, Robotic Task, Gesture Recognition

Share

36 Downloads

Export Citations

This paper presents a system that combines speech and pointing gestures along with four distinct hand gestures to precisely identify both the object of interest and parameters for robotic tasks. We utilized skeleton landmarks to detect pointing gestures and determine their direction, while a pre-trained model, trained on 21 hand landmarks from 2D images, was employed to interpret hand gestures. Furthermore, a dedicated model was trained to extract task information from verbal instructions. The framework integrates task parameters derived from verbal instructions with inferred gestures to detect and identify objects of interest (OOI) in the scene, essential for creating accurate final task configurations.

Received: 24 April 2024, Revised: 17 July 2024, Accepted: 25 July 2024, Published Online: 02 August 2024

  1. Y.-L. Kuo, B. Katz, A. Barbu, “Deep Compositional Robotic Planners That Follow Natural Language Commands,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), 4906–4912, IEEE, 2020, doi:10.1109/ICRA40945.2020.9197464.
  2. T. Kollar, S. Tellex, D. Roy, N. Roy, “Toward Understanding Natural Language Directions,” in 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 259–266, IEEE, 2010, doi:10.1109/HRI.2010.5453186.
  3. C. Matuszek, D. Fox, K. Koscher, “Following Directions Using Sta- tistical Machine Translation,” in 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 251–258, IEEE, 2010, doi:10.1109/HRI.2010.5453189.
  4. R. Cantrell, K. Talamadupula, P. Schermerhorn, J. Benton, S. Kambhampati,
    M. Scheutz, “Tell Me When and Why to Do It! Run-Time Planner Model Updates via Natural Language Instruction,” in Proceedings of the Seventh Annual ACM/IEEE International Conference on Human-Robot Interaction, 471–478, 2012, doi:10.1145/2157689.2157840.
  5. M. Skubic, D. Perzanowski, S. Blisard, A. Schultz, W. Adams, M. Bugajska,
    D. Brock, “Spatial Language for Human-Robot Dialogs,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 34(2), 154–167, 2004, doi:10.1109/TSMCC.2004.826273.
  6. S. Tellex, T. Kollar, S. Dickerson, M. Walter, A. Banerjee, S. Teller, N. Roy, “Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation,” in Proceedings of the AAAI Conference on Artificial Intelligence, volume 25, 2011, doi:10.1609/aaai.v25i1.7979.
  7. N. Nguyen-Duc-Thanh, S. Lee, D. Kim, “Two-stage hidden markov model in gesture recognition for human robot interaction,” International Journal of Advanced Robotic Systems, 9(2), 39, 2012, doi:10.5772/50204.
  8. S. Iengo, S. Rossi, M. Staffa, A. Finzi, “Continuous gesture recognition for flexible human-robot interaction,” in 2014 IEEE International Con- ference on Robotics and Automation (ICRA), 4863–4868, IEEE, 2014, doi:10.1109/ICRA.2014.6907571.
  9. G. H. Lim, E. Pedrosa, F. Amaral, N. Lau, A. Pereira, P. Dias, J. L. Azevedo, B. Cunha, L. P. Reis, “Rich and robust human-robot interaction on gesture recognition for assembly tasks,” in 2017 IEEE International conference on autonomous robot systems and competitions (ICARSC), 159–164, IEEE, 2017, doi:10.1109/ICARSC.2017.7964069.
  10. P. Neto, M. Sima˜o, N. Mendes, M. Safeea, “Gesture-based human-robot inter- action for human assistance in manufacturing,” The International Journal of Ad- vanced Manufacturing Technology, 101, 119–135, 2019, doi:10.1007/s00170- 018-2788-x.
  11. Q. Gao, J. Liu, Z. Ju, Y. Li, T. Zhang, L. Zhang, “Static hand gesture recognition with parallel CNNs for space human-robot interaction,” in In- telligent Robotics and Applications: 10th International Conference, ICIRA 2017, Wuhan, China, August 16–18, 2017, Proceedings, Part I 10, 462–473, Springer, 2017, doi:10.1007/978-3-319-65289-4 44.
  12. F. H. Previc, “The Neuropsychology of 3-D Space.” Psychological Bulletin, 124(2), 123, 1998.
  13. H.-S. Fang, S. Xie, Y.-W. Tai, C. Lu, “RMPE: Regional Multi-person Pose Estimation,” in ICCV, 2017.
  14. C.-B. Park, S.-W. Lee, “Real-Time 3D Pointing Gesture Recognition for Mobile Robots With Cascade HMM and Particle Filter,” Image and Vision Computing, 29(1), 51–63, 2011, doi:10.1016/j.imavis.2010.08.006.
  15. Google, “Google/mediapipe: Cross-platform, customizable ML solutions for live and streaming media.” https://github.com/google/mediapipe, ac- cessed: 2022-03-13.
  16. “Hand landmarks,” https://developers.google.com/static/ mediapipe/images/solutions/hand-landmarks.png, accessed: 2023-12-12.

Citations by Dimensions

Citations by PlumX

Crossref Citations

This paper is currently not cited.

No. of Downloads Per Month

No. of Downloads Per Country

Special Issues

Special Issue on Computing, Engineering and Multidisciplinary Sciences
Guest Editors: Prof. Wang Xiu Ying
Deadline: 30 April 2025

Special Issue on AI-empowered Smart Grid Technologies and EVs
Guest Editors: Dr. Aparna Kumari, Mr. Riaz Khan
Deadline: 30 November 2024

Special Issue on Innovation in Computing, Engineering Science & Technology
Guest Editors: Prof. Wang Xiu Ying
Deadline: 15 October 2024