Explainable Machine Learning Using EMG and Accelerometer Sensor Data to Quantify Surgical Skill: Identification of Biomarkers of Expertise

Rahul Soangra
Areef Hossain
Jay Sonagra
Vennila Krishnan

0 evaluations Published on Jun 17, 2025

This article on Sciety

Abstract

Traditional evaluations of surgical skill rely heavily on subjective assessments, limiting precision and scalability in modern surgical education. With the emergence of robotic platforms and simulation-based training, there is a pressing need for objective, interpretable, and scalable tools to assess technical proficiency in surgery. This study introduces an explainable machine learning (XAI) framework using surface electromyography (sEMG) and accelerometer data to classify surgeon skill levels and uncover actionable neuromuscular biomarkers of expertise. Twenty-six participants, including novices, residents, and expert urologists, performed standardized robotic tasks (suturing, knot tying, and peg transfers) while sEMG and motion data were recorded from 12 upper-extremity muscle sites using Delsys® Trigno™ wireless sensors. Time- and frequency-domain features, along with nonlinear dynamical measures such as Lyapunov exponents, entropy, and fractal dimensions, were extracted and fed into multiple supervised machine learning classifiers (SVM, Random Forest, XGBoost, Naïve Bayes). Classification performance was evaluated using accuracy, F1-score, MCC, and AUC. To ensure interpretability, SHAP and LIME were employed to identify and visualize key features distinguishing skill levels. Ensemble models (XGBoost and Random Forest) outperformed others, achieving classification accuracies above 72%, with high F1-scores for all classes. Nonlinear features, particularly Mean_Long_Lyapunov exponent, Correlation Dimension, Approximate Entropy, and Hurst exponent, consistently ranked among the top predictors. Expert surgeons exhibited higher movement complexity and temporal consistency, reflected in higher entropy and correlation dimension, and lower Lyapunov exponents compared to novices. XAI methods revealed that different classes were driven by distinct feature sets: entropy measures best identified novice patterns, while fractal and stability features were more predictive of expert performance. SHAP and LIME enabled both global and instance-specific interpretation of classifier decisions, enhancing transparency and enabling targeted feedback. This study demonstrates the feasibility and utility of combining multimodal wearable sensor data with explainable machine learning to assess robotic surgical skill. The identified biomarkers capture nuanced aspects of motor control—such as adaptability, complexity, and stability—that distinguish novice, intermediate, and expert surgeons. Beyond classification, the explainable framework offers interpretable insights into why specific skill levels were assigned, providing a pathway for personalized surgical feedback and training. This approach advances the development of objective, transparent, and clinically meaningful assessment tools in surgical education.

Related articles are currently not available for this article.