We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Ranking queries on uncertain data.
- Authors
Hua, Ming; Pei, Jian; Lin, Xuemin
- Abstract
Uncertain data is inherent in a few important applications. It is far from trivial to extend ranking queries (also known as top- k queries), a popular type of queries on certain data, to uncertain data. In this paper, we cast ranking queries on uncertain data using three parameters: rank threshold k, probability threshold p, and answer set size threshold l. Systematically, we identify four types of ranking queries on uncertain data. First, a probability threshold top- k query computes the uncertain records taking a probability of at least p to be in the top- k list. Second, a top-( k, l) query returns the top- l uncertain records whose probabilities of being ranked among top- k are the largest. Third, the p-rank of an uncertain record is the smallest number k such that the record takes a probability of at least p to be ranked in the top- k list. A rank threshold top- k query retrieves the records whose p-ranks are at most k. Last, a top-( p, l) query returns the top- l uncertain records with the smallest p-ranks. To answer such ranking queries, we present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation-based algorithm. To answer top-( k, l) queries and top-( p, l) queries, we propose PRist+, a compact index. An efficient index construction algorithm and efficacious query answering methods are developed for PRist+. An empirical study using real and synthetic data sets verifies the effectiveness of the probabilistic ranking queries and the efficiency of our methods.
- Subjects
DATA; SQL; RANKING; PROBABILITY theory; APPROXIMATION theory
- Publication
VLDB Journal International Journal on Very Large Data Bases, 2011, Vol 20, Issue 1, p129
- ISSN
1066-8888
- Publication type
Article
- DOI
10.1007/s00778-010-0196-4