A COMPARATIVE ANALYSIS OF DEEP LEARNING AND TRADITIONAL METHODS FOR IMPUTATION OF MISSING BODY WEIGHT DATA IN HOSPITAL RECORDS: SIMULATION IN IPD-ICU SETTING

Authors

  • Metas MUANGNAK
  • Vitara PUNGPAPONG

Abstract

Missing data in hospital records especially in Intensive Care Units (ICU) and Inpatient Departments (IPD) is a common problem that can affect patient care and research accuracy. This study compares ten imputation methods, including traditional methods (mean, median, k-NN, MICE, MissForest), a hybrid method (HyperImpute), and deep learning (DL) methods (MLPRegressor, AEImputer, MIWAE, GAIN), to evaluate their performance in handling missing body weight data. A simulated dataset with 63 features was created under controlled conditions, varying across three sample sizes (5,000, 25,000, 50,000), three missingness mechanisms (MCAR, MAR, MNAR), and six missingness rates (10% to 60%). Performance was assessed using RMSE, MAPE, runtime, and memory usage. The results show that simple methods like mean and median still perform well, offering solid baseline performance with minimal resource usage. MissForest and HyperImpute offer a good trade-off between accuracy and computational efficiency, making them suitable for moderate missingness scenarios. Although widely used, the MICE method showed limited adaptability to non-parametric or complex data structures, leading to suboptimal results in several conditions. In contrast, deep learning models gave mixed results. DL sometimes performed well but often required intensive hyperparameter tuning and used more runtime and memory usage. While AEs method showed stable performance, GAIN was sensitive to both missing data patterns and sample sizes, leading to inconsistent outcomes. Overall, while deep learning has potential, it comes with challenges such as sensitivity to hyperparameters and high computational demands. In many practical cases, traditional or hybrid methods may be more effective and easier to implement.

Downloads

Published

2025-05-06