Keywords
Big data; clustering; k-means; land change science; model performance.
Location
Session C2: Big Data and Geoscience: Concept, Theory and Algorithm
Start Date
13-7-2016 10:30 AM
End Date
13-7-2016 10:50 AM
Abstract
Big data has received a lot of attentions during last decade. Dealing with big data is a complex task as it is challenging to analyse, store, model and make sense of big data. Using big data in land change science (LCS) is a new area of research. One of the main challenges in LCS is using big data for model calibration effectively. In this study, we propose using clustering technique, based on k-means approach, to handle big land-use data and overcome the limitation of model calibration. With clustering, data are partitioned into smaller subsets and are t hen analysed separately very similar to parallel processing. We evaluated the proposed approach using well-established goodness-of-fit metrics (e.g., accuracy measure). We compared the results obtained by applying k -means clustering (k = 5) and not applying (k = 1). Each configuration replicated 20 times to account for the randomness in the partitioning for training and testing datasets. The experiments were based on land-use data in southern east of Wisconsin in USA. Results showed great fit with an overall accuracy of 92.5% and 95.96%, for k =5 and k =1, respectively.
Included in
Civil Engineering Commons, Data Storage Systems Commons, Environmental Engineering Commons, Hydraulic Engineering Commons, Other Civil and Environmental Engineering Commons
Big Data in Land Change Science: Challenges and Implications of Advanced Technologies
Session C2: Big Data and Geoscience: Concept, Theory and Algorithm
Big data has received a lot of attentions during last decade. Dealing with big data is a complex task as it is challenging to analyse, store, model and make sense of big data. Using big data in land change science (LCS) is a new area of research. One of the main challenges in LCS is using big data for model calibration effectively. In this study, we propose using clustering technique, based on k-means approach, to handle big land-use data and overcome the limitation of model calibration. With clustering, data are partitioned into smaller subsets and are t hen analysed separately very similar to parallel processing. We evaluated the proposed approach using well-established goodness-of-fit metrics (e.g., accuracy measure). We compared the results obtained by applying k -means clustering (k = 5) and not applying (k = 1). Each configuration replicated 20 times to account for the randomness in the partitioning for training and testing datasets. The experiments were based on land-use data in southern east of Wisconsin in USA. Results showed great fit with an overall accuracy of 92.5% and 95.96%, for k =5 and k =1, respectively.