We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data.
- Authors
Al Aghbari, Zaher; Ismail, Tasneem; Kamel, Ibrahim
- Abstract
The increase in GPS-enabled devices and proliferation of location-based applications have resulted in an abundance of geotagged (spatial) data. As a consequence, numerous applications have emerged that utilize the spatial data to provide different types of location-based services. However, the huge amount of available spatial data presents a challenge to the efficiency of these location-based services. Although the advent of big data frameworks like Apache Spark has enabled the processing of large amounts of data efficiently, they are designed for general (non-spatial) data. That is due to the build-in data partitioning mechanism that does not take into account the spatial proximity of the data. Therefore, these big data frameworks cannot be readily used for spatial analytics such as efficiently answering spatial queries. To fill this gap, this paper proposes SparkNN, an in-memory partitioning and indexing system for answering spatial queries, such as K-nearest neighbor, on big spatial data. SparkNN is implemented on top of Apache Spark and consists of three layers to facilitate efficient spatial queries. The first layer is a spatial-aware partitioning layer, which partitions the spatial data into several partitions ensuring that the load of the partitions is balanced and data objects with close proximity are placed in the same, or neighboring, partitions. The second layer is a local indexing layer, which provides a spatial index inside each partition to speed up the data search within the partition. The third layer is a global index, which is placed in the master node of Spark to route spatial queries to the relevant partitions. The efficiency of SparkNN was evaluated by extensive experiments with big spatial datasets. The results show SparkNN significantly outperforms the state-of-the-art Spark system when evaluated on the same set of queries.
- Subjects
SPATIAL data infrastructures; GEOGRAPHIC information systems; QUERY (Information retrieval system); BIG data; LOCATION-based services
- Publication
Data Science Journal, 2020, Vol 19, p1
- ISSN
1683-1470
- Publication type
Article
- DOI
10.5334/dsj-2020-035