Most widely accepted measures of performance for learning algorithms, such as accuracy and area under the ROC curve, provide information about behavior at the data set level. They say nothing about which instances are misclassified, whether two learning algorithms with the same classification accuracy on a data set misclassify the same instances, or whether there are instances misclassified by all learning algorithms. These questions about behavior at the instance level motivate our empirical analysis of instance hardness, a measure of expected classification accuracy for an instance. We analyze the classification of 57 data sets using 9 learning algorithms. Of the over 175000 instances investigated, 5% are misclassified by all 9 of the considered learning algorithms, and 15% are misclassified by at least half. We find that the major cause of misclassification for hard instances is class overlap, manifested as outliers and border points which can be exacerbated by class skew. We analyze these causes and show to what extent each leads to misclassifications, both in isolation and jointly. 19.8% of all misclassified instances are outliers; 71.3% are border points; 21% belong to a minority class. We also find that 91.6% of all outliers and 38.3% of all border points are misclassified whereas only 3.5% of instances without class overlap are misclassified. We propose a set of heuristics to predict when an instance will be hard to correctly classify. Additionally, we analyze how different learning algorithms perform on tasks with varying degrees of outliers, border points and class skew.



College and Department

Physical and Mathematical Sciences; Computer Science



Date Submitted


Document Type





instance hardness, outliers, border points, class skew