Keywords

Data Quality Evaluation; Fault Detection; Automated Monitoring Systems; Time Series Analysis; Data Validation Environmental Time Series

Start Date

26-6-2018 9:00 AM

End Date

26-6-2018 10:20 AM

Abstract

With the increasing presence of online sensors to monitor environmental systems, automatic outlier detection procedures are necessary to tackle the huge amount of data generated. Cleaned time series can afterwards be filtered, screened for fault detection, used for process control or decision support.

Outliers from time series exhibiting stable signals can easily be removed by tracking the local mean and standard deviation and accepting all data within, e.g., from the mean. However, environmental time series often exhibit large discontinuities or highly unstable behaviour, which requires more advanced outlier detection methods. Unfortunately, these advanced methods require complex parameterization that depends on the variable being measured, the local environmental conditions and/or the sensor at hand. Therefore, method tuning is necessary for each new time series, making outlier detection expensive to implement.

In this work, a neural network (NN), inspired by the Replicator Neural Network (Hawkins et al., 2002) and autoencoder neural networks (Chen et al., 2017), was developed to perform automatic outlier detection of time series. The NN’s objective is to reproduce the time series as close as possible while having a structure simple enough to underfit the data. With such NN it is then possible to track replication error to detect outliers.

The NN was applied to high frequency time series of water quality from rivers, sewers and wastewater treatment plants over different variables (conductivity, turbidity, temperature, pH, nitrogen). Disturbances such as fast pumping cycles, pollution spills and sensor failure were tested. A NN configuration with default settings allowed outlier detection over long time series (50 000 data points) in short time (less than a minute) and minimal misclassification.

Stream and Session

Proposed session: B2: Hybrid modelling and innovative data analysis for integrated environmental decision support

Organizers: Peter A. Khaiter, Marina G. Erechtchoukova

COinS
 
Jun 26th, 9:00 AM Jun 26th, 10:20 AM

Neural Network for Tuning-Friendly Automatic Outlier Detection in Water Quality Time Series

With the increasing presence of online sensors to monitor environmental systems, automatic outlier detection procedures are necessary to tackle the huge amount of data generated. Cleaned time series can afterwards be filtered, screened for fault detection, used for process control or decision support.

Outliers from time series exhibiting stable signals can easily be removed by tracking the local mean and standard deviation and accepting all data within, e.g., from the mean. However, environmental time series often exhibit large discontinuities or highly unstable behaviour, which requires more advanced outlier detection methods. Unfortunately, these advanced methods require complex parameterization that depends on the variable being measured, the local environmental conditions and/or the sensor at hand. Therefore, method tuning is necessary for each new time series, making outlier detection expensive to implement.

In this work, a neural network (NN), inspired by the Replicator Neural Network (Hawkins et al., 2002) and autoencoder neural networks (Chen et al., 2017), was developed to perform automatic outlier detection of time series. The NN’s objective is to reproduce the time series as close as possible while having a structure simple enough to underfit the data. With such NN it is then possible to track replication error to detect outliers.

The NN was applied to high frequency time series of water quality from rivers, sewers and wastewater treatment plants over different variables (conductivity, turbidity, temperature, pH, nitrogen). Disturbances such as fast pumping cycles, pollution spills and sensor failure were tested. A NN configuration with default settings allowed outlier detection over long time series (50 000 data points) in short time (less than a minute) and minimal misclassification.