Stream A: Advanced Methods and Approaches in Environmental Computing

Benchmarking Apache Spark spatial libraries

Hector Muro Mauri, Wageningen University and ResearchFollow
Rob Knapen, Wageningen University and ResearchFollow
Arend Ligtenberg, Wageningen University and ResearchFollow
Sander Janssen, Wageningen University and ResearchFollow
Ioannis N. Athanasiadis, Wageningen University and ResearchFollow

Keywords

Geospatial data, Big data, scalability, Apache Spark, Data modeling

Start Date

25-6-2018 2:00 PM

End Date

25-6-2018 3:20 PM

Abstract

Apache Spark is one of the most widely used and fast-evolving cluster-computing frame- works for big data. This research investigates the state of practice in the Apache Spark ecosystem for managing spatial data, with a specific focus on spatial vector data. Apache Spark is a relatively new platform, and the associated libraries for geospatial data extensions are still work-in-progress. In this work, three libraries for managing geospatial information in Apache Spark have been investigated, namely GeoSpark, GeoPySpark, and Magellan. First we designed and performed a suite of functionality tests, to explore how much can be done with. Then, we benchmarked the performance of the libraries for executing common spatial tasks using annoyingly big geospatial datasets. Finally, we compare the performance of the three libraries in contrast to a traditional Geographic Information System that uses a relational database for storage. Our findings about the maturity of the libraries and the scalability of solutions in Apache Spark are mixed, as key functionalities are still missing, but gains in the elapsed real time to respond to queries can be up to two orders of magnitude faster.

Stream and Session

A1: Towards More Interoperable, Reusable and Scalable Environmental Software

Download

COinS

Jun 25th, 2:00 PM Jun 25th, 3:20 PM

Benchmarking Apache Spark spatial libraries

Stream A: Advanced Methods and Approaches in Environmental Computing

Benchmarking Apache Spark spatial libraries

Keywords

Start Date

End Date

Abstract

Stream and Session

Conference Links

Search

BYU

BYU Links

Links

Stream A: Advanced Methods and Approaches in Environmental Computing

Benchmarking Apache Spark spatial libraries

Presenter/Author Information

Keywords

Start Date

End Date

Abstract

Stream and Session

Share

Conference Links

Search

BYU

BYU Links

Links