• Login
    • University Home
    • Library Home
    • Lib Catalogue
    • Advance Search
    View Item 
    •   IR@KDU Home
    • INTERNATIONAL RESEARCH CONFERENCE ARTICLES (KDU IRC)
    • 2017 IRC Articles
    • Computing
    • View Item
    •   IR@KDU Home
    • INTERNATIONAL RESEARCH CONFERENCE ARTICLES (KDU IRC)
    • 2017 IRC Articles
    • Computing
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Comparison of Classical Statistical & Machine Learning Techniques in Binary Classification

    Thumbnail
    View/Open
    006.pdf (488.2Kb)
    Date
    2017
    Author
    Perera, KVU
    Viswakula, SD
    Metadata
    Show full item record
    Abstract
    Predicting a precise response for previously unseen input variables is a vital and challenging task, as precise predictions can minimize the risks related to different domains by making correct decisions. The main objective of this study was to compare the performance of several classical statistical and machine learning techniques by considering the prediction task as a binary classification. The classification techniques; Logistic Regression (LR) and Linear Discriminant Analysis (LDA) were considered under classical statistical techniques while Random Forest (RF), Naïve Bayes (NB), Boosting (BT) and Bagging (BA) were considered under machine learning techniques. The performance of those techniques were compared under the two different aspects by using five real datasets. In one aspect, class imbalance was artificially introduced to the datasets by resampling. In the other aspect sampling approaches such as undersampling, oversampling and hybrid approach (mix of both undersampling and oversampling) were considered, to overcome class imbalance in the training set. Several evaluation methods such as accuracy, precision, F-measure, G-mean and Receiver Operating Characteristics Area Under Curve (ROC AUC) were considered to evaluate the performance of the classification techniques. The results indicated that the performance of Random Forest and boosting are better than the performance of other techniques in both resampling and overcoming class imbalance aspects. In many cases when the training set was balanced, not only the machine learning techniques but also the statistical techniques had better performance.
    URI
    http://ir.kdu.ac.lk/handle/345/1681
    Collections
    • Computing [28]

    Library copyright © 2017  General Sir John Kotelawala Defence University, Sri Lanka
    Contact Us | Send Feedback
     

     

    Browse

    All of IR@KDUCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultyDocument TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsFacultyDocument Type

    My Account

    LoginRegister

    Library copyright © 2017  General Sir John Kotelawala Defence University, Sri Lanka
    Contact Us | Send Feedback