Show simple item record

dc.contributor.authorPerera, KVU
dc.contributor.authorViswakula, SD
dc.date.accessioned2018-06-08T09:47:50Z
dc.date.available2018-06-08T09:47:50Z
dc.date.issued2017
dc.identifier.urihttp://ir.kdu.ac.lk/handle/345/1681
dc.descriptionArticle Full Texten_US
dc.description.abstractPredicting a precise response for previously unseen input variables is a vital and challenging task, as precise predictions can minimize the risks related to different domains by making correct decisions. The main objective of this study was to compare the performance of several classical statistical and machine learning techniques by considering the prediction task as a binary classification. The classification techniques; Logistic Regression (LR) and Linear Discriminant Analysis (LDA) were considered under classical statistical techniques while Random Forest (RF), Naïve Bayes (NB), Boosting (BT) and Bagging (BA) were considered under machine learning techniques. The performance of those techniques were compared under the two different aspects by using five real datasets. In one aspect, class imbalance was artificially introduced to the datasets by resampling. In the other aspect sampling approaches such as undersampling, oversampling and hybrid approach (mix of both undersampling and oversampling) were considered, to overcome class imbalance in the training set. Several evaluation methods such as accuracy, precision, F-measure, G-mean and Receiver Operating Characteristics Area Under Curve (ROC AUC) were considered to evaluate the performance of the classification techniques. The results indicated that the performance of Random Forest and boosting are better than the performance of other techniques in both resampling and overcoming class imbalance aspects. In many cases when the training set was balanced, not only the machine learning techniques but also the statistical techniques had better performance.en_US
dc.language.isoenen_US
dc.subjectStatisticsen_US
dc.subjectMachine Learningen_US
dc.subjectClassificationen_US
dc.subjectResamplingen_US
dc.subjectClass Imbalanceen_US
dc.titleA Comparison of Classical Statistical & Machine Learning Techniques in Binary Classificationen_US
dc.typeArticle Full Texten_US
dc.identifier.journalKDU IRCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record