We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Efficient and effective similarity search over probabilistic data based on Earth Mover's Distance.
- Authors
Xu, Jia; Zhang, Zhenjie; Tung, Anthony; Yu, Ge
- Abstract
Advances in geographical tracking, multimedia processing, information extraction, and sensor networks have created a deluge of probabilistic data. While similarity search is an important tool to support the manipulation of probabilistic data, it raises new challenges to traditional relational databases. The problem stems from the limited effectiveness of the distance metrics employed by existing database systems. On the other hand, several more complicated distance operators have proven their values for better distinguishing ability in specific probabilistic domains. In this paper, we discuss the similarity search problem with respect to Earth Mover's Distance (EMD). EMD is the most successful distance metric for probability distribution comparison but is an expensive operator as it has cubic time complexity. We present a new database indexing approach to answer EMD-based similarity queries, including range queries and k-nearest neighbor queries on probabilistic data. Our solution utilizes primal-dual theory from linear programming and employs a group of B trees for effective candidate pruning. We also apply our filtering technique to the processing of continuous similarity queries, especially with applications to frame copy detection in real-time videos. Extensive experiments show that our proposals dramatically improve the usefulness and scalability of probabilistic data management.
- Subjects
DISTRIBUTION (Probability theory); PROBABILITY measures; DATABASES; TREE graphs; DYNAMIC programming; VECTOR analysis; PROGRAMMING languages
- Publication
VLDB Journal International Journal on Very Large Data Bases, 2012, Vol 21, Issue 4, p535
- ISSN
1066-8888
- Publication type
Article
- DOI
10.1007/s00778-011-0258-2