Predicting postoperative delirium after lung cancer resection: the utility of synthetic data and LIME algorithm for model interpretation.
To construct an efficient predictive model for post-lung cancer resection delirium (POD) using artificial intelligence, with a focus on leveraging synthetic data (generated via the Synthetic Data Vault [SDV] framework) in small-sample training scenarios. Additionally, we aim to enhance clinical interpretability through a locally interpretable model-agnostic explanation (LIME) method, thereby addressing the existing research gap in AI-driven POD prediction following pneumonectomy. The SDV framework was employed to generate 2,000 synthetic data points (serving as the training set), while real-world data from Figshare (n = 570 cases, used as the test set) were utilized to validate the model. Twelve machine learning algorithms (e.g., Gaussian Naive Bayes [GNB] and random forest) were incorporated, with performance metrics including accuracy, recall, and AUC evaluated via 50% cross-validation. LIME was applied to interpret individual sample predictions and analyze the contributions of key features to POD risk In this study, LIME was employed to explain the prediction outcome of a Decision Tree classifier for a single sample in a postoperative delirium dataset. Utilizing machine learning algorithms, the study identified preoperative blood glucose levels, forced expiratory volume (VC), mean corpuscular volume (MCV), and preoperative albumin levels as the four most critical factors influencing delirium. The GNB algorithm exhibited an accuracy of 89.8% within the real-world dataset cohort. In terms of precision, Gaussian Naive Bayes (gnb) ranked first. For recall and F1 score, gnb also performed the best, with a recall rate of 0.263 and an F1 score of 0.256. LinearSVC achieved the highest AUC value of 0.763, followed by Logistic Regression (0.754), MLPC (0.752), and gnb (0.727) in terms of area under the curve. The study demonstrates the effectiveness of using synthetic data for training AI models in predicting postoperative delirium. The findings suggest that GNB could be a preferred algorithm for predicting postoperative delirium.