Conference papers

Vision-Language System using Open-Source LLMs for Consent and Instruction Gestures in Medical Interpreter Robots

Author ORCID Identifier

https://orcid.org/0009-0004-0065-8600

Document Type

Conference Paper

Disciplines

Computer Sciences

Publication Details

Companion Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI Companion ’26), March 16–19, 2026, Edinburgh, Scotland, UK.

doi:10.1145/3776734.3794357

Abstract

Effective communication is vital in healthcare, especially across language barriers, where non-verbal cues and gestures are critical. This paper presents a privacy-preserving vision-language framework for medical interpreter robots that detects specific speech acts (consent and instruction) and generates corresponding robotic gestures. Built on locally deployed open-source models, the system utilizes a Large Language Model (LLM) with few-shot prompting for intent detection. We also introduce a novel dataset of clinical conversations annotated for speech acts and paired with gesture clips. Our identification module achieved 0.90 accuracy, 0.93 weighted precision, and a 0.91 weighted F1-Score. Our approach significantly improves computational efficiency and, in user studies, outperforms the speech-gesture generation baseline in human-likeness while maintaining comparable appropriateness.

DOI

https://doi.org/10.1145/3776734.3794357

Recommended Citation

Thanh-Tung Ngo, Emma Murphy, and Robert J. Ross. 2026. Vision-Language System using Open-Source LLMs for Consent and Instruction Gestures in Medical Interpreter Robots. In Companion Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI Companion ’26), March 16–19, 2026, Edinburgh, Scotland, UK . ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3776734.3794357

Funder

Research Ireland

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Contact the Author

Included in

Computer Sciences Commons

COinS

Conference papers

Vision-Language System using Open-Source LLMs for Consent and Instruction Gestures in Medical Interpreter Robots

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Vision-Language System using Open-Source LLMs for Consent and Instruction Gestures in Medical Interpreter Robots

Authors

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links