Keywords
dynamic emulation modelling, time series clustering, variable selection, data-driven models, physically-based models
Start Date
1-7-2012 12:00 AM
Abstract
Dynamic Emulation Modelling (DEMo) is emerging as a viable solution tocombine computationally intensive simulation models and dynamic optimization algorithms.A dynamic emulator is a low order surrogate of the simulation model identifiedover a sample data set generated by the original simulation model itself. When appliedto large 3D models, any DEMo exercise does require a pre-processing of the exogenousdrivers and state variables in order to reduce, by spatial aggregation, the high numberof candidate variables to appear in the final emulator. This work describes a hybridclustering-variable selection approach to automatically discover compact and relevantrepresentations of high-dimensional data sets. Time series clustering is adopted to identifyspatial structures by objectively organizing data into homogenous groups, where thewithin-group-object similarity is minimized. In particular, the proposed approach relies ona hierarchical agglomerative clustering method, which starts by placing each time-seriesin its own cluster, and then merges clusters into larger clusters, until a compact, yet informative,representation of the original variables can be processed with the RecursiveVariable Selection - Iterative Input Selection algorithm, in order to single out the mostrelevant clusters. The approach is demonstrated on a real-world case study concerningthe reduction of Delft3D, a spatially distributed hydrodynamic model used to simulate saltintrusion dynamics in the tropical lake of Marina Reservoir, Singapore. Results showthat the proposed approach permits a parsimonious, though accurate, characterizationof salinity concentration.
Improved dynamic emulation modeling by time series clustering: the case study of Marina Reservoir, Singapore
Dynamic Emulation Modelling (DEMo) is emerging as a viable solution tocombine computationally intensive simulation models and dynamic optimization algorithms.A dynamic emulator is a low order surrogate of the simulation model identifiedover a sample data set generated by the original simulation model itself. When appliedto large 3D models, any DEMo exercise does require a pre-processing of the exogenousdrivers and state variables in order to reduce, by spatial aggregation, the high numberof candidate variables to appear in the final emulator. This work describes a hybridclustering-variable selection approach to automatically discover compact and relevantrepresentations of high-dimensional data sets. Time series clustering is adopted to identifyspatial structures by objectively organizing data into homogenous groups, where thewithin-group-object similarity is minimized. In particular, the proposed approach relies ona hierarchical agglomerative clustering method, which starts by placing each time-seriesin its own cluster, and then merges clusters into larger clusters, until a compact, yet informative,representation of the original variables can be processed with the RecursiveVariable Selection - Iterative Input Selection algorithm, in order to single out the mostrelevant clusters. The approach is demonstrated on a real-world case study concerningthe reduction of Delft3D, a spatially distributed hydrodynamic model used to simulate saltintrusion dynamics in the tropical lake of Marina Reservoir, Singapore. Results showthat the proposed approach permits a parsimonious, though accurate, characterizationof salinity concentration.