Title : Mining ICDDR, B Hospital Surveillance Data and Exhibiting Strategies for Balancing Large Unbalanced Datasets


Authors : Adnan Firoze, Rashedur M. Rahman

Abstract : This research uses a number of classifier models on Hospital Surveillance data to classify admitted patients according to their critical conditions. Three class labels were used to distinguish the criticality of the admitted patients. Furthermore, set forth are two distinct approaches to address the over-fitting problem in the unbalanced dataset since the frequency of instances of the class ‘low' is significantly higher than the other two classes. Apart from trimming the dataset to balance the classes, this work has dealt with the over-fitting problem by introducing the ‘Synthetic Minority Over-sampling Technique' (SMOTE) algorithm coupled with Locally Linear Embedding (LLE). It has constructed three models that applied the neural, and multinomial logistic regression classifications and finally compared the performance of the work's models with the models developed by Rahman and Hasan (2011) where they used several decision tree models to classify the same dataset using tenfold cross validation. Additionally, for a comprehensive comparative analysis, this work has compared the classification performance of the authors' novel third model using support vector machine (SVM). After comparison, the work shows that one of the authors' models surpasses all prior models in terms of classification performance, taking into account the performance time trade-off, giving them an efficient model that handles large scale unbalanced datasets efficiently with standard classification performance. The models developed in this research can become imperative tools to doctors when large numbers of patients arrive in a short interval especially during epidemics. Since, intervention of machines become a necessity when doctors are scarce, computer applications powered by these models are helpful to diagnose and measure the criticality of the newly arrived patients with the help of the historical data kept in the surveillance database.


Journal : International Journal of Healthcare Information Systems and Informatics (IJHISI) Volume : 10 Year : 2015 Issue : 1
Pages : 39-66 City : Edition : Editors :
Publisher : ISBN : Book : Chapter :
Proceeding Title : Institution : Issuer : Number :