Keywords

preprocessing, postprocessing, knowledge discovery, data mining, water supply networks

Start Date

1-7-2010 12:00 AM

Abstract

Pre and post-processing are crucial tasks in Knowledge Discovery in Databases (KDD). In this contribution we present an application to a data set from a real water supply network (WSN) in the town of Calarcá (Colombia), located in the so-called "Eje Cafetero" coffee region. We use traditional and well-known techniques of pre and post-processing with the aim of showing its importance in Data Mining (DM), and of enhancing the need of results interpretability when dealing with real data set. Pre and post-processing tools, as well as other DM tasks implemented in Clementine 9.0 (SPSS), have been used. Clementine 9.0 has a number of pre and post-processing tools to work with records (rows) and fields (columns) in a database. Basically, we used selection and deriving operations for records, and type and filter operations for fields. The database consists of a record of requests, complains and claims (PQRs in Spanish), for the year 2006, remitted to the Calarcá Water Supply Company Multipropósito, S.A. ESP. Additionally, the database is also integrated by the network hydraulic model, some climatic variables, and thematic maps of vulnerabilities and risk areas for natural phenomena. The PQRs information consists of 846 records. First, the consistency of the PQRs was evaluated to determine outliers, and lost or missing information. Next, each point was located on the map of the town and its UTM coordinates were obtained. Then, each PQR was associated to its nearest pipe and node of the primary network. The graphical classification of variables shows trends that permit us to obtain a priori conclusions in KDD. These data were used to feed the model and to obtain relationships between different variables and the damage type on the network well within the post-processing task.

Share

COinS
 
Jul 1st, 12:00 AM

The tasks of pre and post-processing in Data Mining applied to a real world problem

Pre and post-processing are crucial tasks in Knowledge Discovery in Databases (KDD). In this contribution we present an application to a data set from a real water supply network (WSN) in the town of Calarcá (Colombia), located in the so-called "Eje Cafetero" coffee region. We use traditional and well-known techniques of pre and post-processing with the aim of showing its importance in Data Mining (DM), and of enhancing the need of results interpretability when dealing with real data set. Pre and post-processing tools, as well as other DM tasks implemented in Clementine 9.0 (SPSS), have been used. Clementine 9.0 has a number of pre and post-processing tools to work with records (rows) and fields (columns) in a database. Basically, we used selection and deriving operations for records, and type and filter operations for fields. The database consists of a record of requests, complains and claims (PQRs in Spanish), for the year 2006, remitted to the Calarcá Water Supply Company Multipropósito, S.A. ESP. Additionally, the database is also integrated by the network hydraulic model, some climatic variables, and thematic maps of vulnerabilities and risk areas for natural phenomena. The PQRs information consists of 846 records. First, the consistency of the PQRs was evaluated to determine outliers, and lost or missing information. Next, each point was located on the map of the town and its UTM coordinates were obtained. Then, each PQR was associated to its nearest pipe and node of the primary network. The graphical classification of variables shows trends that permit us to obtain a priori conclusions in KDD. These data were used to feed the model and to obtain relationships between different variables and the damage type on the network well within the post-processing task.