We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Query-aware locality-sensitive hashing scheme for $$l_p$$ norm.
- Authors
Huang, Qiang; Feng, Jianlin; Fang, Qiong; Ng, Wilfred; Wang, Wei
- Abstract
The problem of c-Approximate Nearest Neighbor ( c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing schemes to tackle the c-ANN search problem. Traditionally, LSH functions are constructed in a query-oblivious manner, in the sense that buckets are partitioned before any query arrives. However, objects closer to a query may be partitioned into different buckets, which is undesirable. Due to the use of query-oblivious bucket partition, the state-of-the-art LSH schemes for external memory, namely C2LSH and LSB-Forest, only work with approximation ratio of integer $$c \ge 2$$ . In this paper, we introduce a novel concept of query-aware bucket partition which uses a given query as the 'anchor' for bucket partition. Accordingly, a query-aware LSH function under a specific $$l_p$$ norm with $$p \in (0,2]$$ is a random projection coupled with query-aware bucket partition, which removes random shift required by traditional query-oblivious LSH functions. The query-aware bucket partitioning strategy can be easily implemented so that query performance is guaranteed. For each $$l_p$$ norm $$(p \in (0,2])$$ , based on the corresponding p-stable distribution, we propose a novel LSH scheme named query-aware LSH (QALSH) for c-ANN search over external memory. Our theoretical studies show that QALSH enjoys a guarantee on query quality. The use of query-aware LSH function enables QALSH to work with any approximation ratio $$c > 1$$ . In addition, we propose a heuristic variant named QALSH $$^+$$ to improve the scalability of QALSH. Extensive experiments show that QALSH and QALSH $$^+$$ outperform the state-of-the-art schemes, especially in high-dimensional space. Specifically, by using a ratio $$c < 2$$ , QALSH can achieve much better query quality.
- Subjects
HASHING; NEAREST neighbor analysis (Statistics); QUERYING (Computer science)
- Publication
VLDB Journal International Journal on Very Large Data Bases, 2017, Vol 26, Issue 5, p683
- ISSN
1066-8888
- Publication type
Article
- DOI
10.1007/s00778-017-0472-7