Keywords

automatic noise reduction, artificial neural networks

Abstract

During the data collecting and labeling process it is possible for noise to be introduced into a data set. As a result, the quality of the data set degrades and experiments and inferences derived from the data set become less reliable. In this paper we present an algorithm, called ANR (automatic noise reduction), as a filtering mechanism to identify and remove noisy data items whose classes have been mislabeled. The underlying mechanism behind ANR is based on a framework of multi-layer artificial neural networks. ANR assigns each data item a soft class label in the form of a class probability vector, which is initialized to the original class label and can be modified during training. When the noise level is reasonably small (< 30%), the non-noisy data is dominant in determining the network architecture and its output, and thus a mechanism for correcting mislabeled data can be provided by aligning class probability vector with the network output. With a learning procedure for class probability vector based on its difference from the network output, the probability of a mislabeled class gradually becomes smaller while that of the correct class becomes larger, which eventually causes a correction of mislabeled data after sufficient training. After training, those data items whose classes have been relabeled are then treated as noisy data and removed from the data set. We evaluate the performance of the ANR based on 12 data sets drawn from the UCI data repository. The results show that ANR is capable of identifying a significant portion of noisy data. An average increase in accuracy of 24.5% can be achieved at a noise levelof 25% by using ANR as a training data filter for a nearest neighbor classifier, as compared to the one without using ANR.

Original Publication Citation

Zeng, X., and Martinez, T. R., "A Noise Filtering Method Using Neural Networks", Proceedings of the IEEE International Workshop on Soft-Computing Techniques in Instrumentation, Measurement, and Related Applications, pp. 26-31, 23.

Document Type

Peer-Reviewed Article

Publication Date

2003-05-17

Permanent URL

http://hdl.lib.byu.edu/1877/2410

Publisher

IEEE

Language

English

College

Physical and Mathematical Sciences

Department

Computer Science

Share

COinS