ISM Research Memorandum
No.
1060
Title:
Flexible Combinations of Markers by Boosting the Area under the ROC Curve
Author(s):
Osamu, Komori (Department of Statistical Science, The Graduate University for Advanced Studies);
Shinto, Eguchi (The Institute of Statistical Science and Department of Statistical Science, The Graduate University for Advanced Studies)
Key words:
ROC curve; AUC; Boosting; Classification; Smoothing
Abstract:
In this paper, we discuss the application of the receiver operating characteristic (ROC) curve for disease classification problems. We propose a statistical method to combine multiple markers, based on a boosting algorithm that maximizes the area under the ROC curve (AUC). The method iteratively searches a predetermined set of various weak classifiers for the one that most closely associates with the disease classification. After a moderate number of repetitions of this process, a powerful classifier is produced by a weighted majority vote. A regularization procedure to prevent overfitting to data is considered in our algorithm using a penalty term for non-smoothness. This regularization method not only improves the classification performance but also helps us to get a clearer understanding about how each marker is related to the disease. Score plots constructed by our boosting method, which express each single-marker's contribution to the combination of multiple markers, give us clinically interesting information. In order to illustrate the utility of our boosting method, we describe two simulation studies and an application of our method to prostate cancer data.