Keywords

Big data; clustering; k-means; land change science; model performance.

Location

Session C2: Big Data and Geoscience: Concept, Theory and Algorithm

Start Date

13-7-2016 10:30 AM

End Date

13-7-2016 10:50 AM

Abstract

Big data has received a lot of attentions during last decade. Dealing with big data is a complex task as it is challenging to analyse, store, model and make sense of big data. Using big data in land change science (LCS) is a new area of research. One of the main challenges in LCS is using big data for model calibration effectively. In this study, we propose using clustering technique, based on k-means approach, to handle big land-use data and overcome the limitation of model calibration. With clustering, data are partitioned into smaller subsets and are t hen analysed separately very similar to parallel processing. We evaluated the proposed approach using well-established goodness-of-fit metrics (e.g., accuracy measure). We compared the results obtained by applying k -means clustering (k = 5) and not applying (k = 1). Each configuration replicated 20 times to account for the randomness in the partitioning for training and testing datasets. The experiments were based on land-use data in southern east of Wisconsin in USA. Results showed great fit with an overall accuracy of 92.5% and 95.96%, for k =5 and k =1, respectively.

Share

COinS
 
Jul 13th, 10:30 AM Jul 13th, 10:50 AM

Big Data in Land Change Science: Challenges and Implications of Advanced Technologies

Session C2: Big Data and Geoscience: Concept, Theory and Algorithm

Big data has received a lot of attentions during last decade. Dealing with big data is a complex task as it is challenging to analyse, store, model and make sense of big data. Using big data in land change science (LCS) is a new area of research. One of the main challenges in LCS is using big data for model calibration effectively. In this study, we propose using clustering technique, based on k-means approach, to handle big land-use data and overcome the limitation of model calibration. With clustering, data are partitioned into smaller subsets and are t hen analysed separately very similar to parallel processing. We evaluated the proposed approach using well-established goodness-of-fit metrics (e.g., accuracy measure). We compared the results obtained by applying k -means clustering (k = 5) and not applying (k = 1). Each configuration replicated 20 times to account for the randomness in the partitioning for training and testing datasets. The experiments were based on land-use data in southern east of Wisconsin in USA. Results showed great fit with an overall accuracy of 92.5% and 95.96%, for k =5 and k =1, respectively.