We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
XCQ: A queriable XML compression system.
- Authors
Ng, Wilfred; Lam, Wai-Yeung; Wood, Peter; Levene, Mark
- Abstract
XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a factor that hinders its practical usage, since it substantially increases the costs of storing, processing, and exchanging data. In order to tackle this problem, many XML-specific compression systems, such as XMill, XGrind, XMLPPM, and Millau, have recently been proposed. However, these systems usually suffer from the following two inadequacies: They either sacrifice performance in terms of compression ratio and execution time in order to support a limited range of queries, or perform full decompression prior to processing queries over compressed documents. In this paper, we address the above problems by exploiting the information provided by a Document Type Definition (DTD) associated with an XML document. We show that a DTD is able to facilitate better compression as well as generate more usable compressed data to support querying. We present the architecture of the XCQ, which is a compression and querying tool for handling XML data. XCQ is based on a novel technique we have developed called DTD Tree and SAX Event Stream Parsing (DSP). The documents compressed by XCQ are stored in Partitioned Path-Based Grouping (PPG) data streams, which are equipped with a Block Statistics Signature (BSS) indexing scheme. The indexed PPG data streams support the processing of XML queries that involve selection and aggregation, without the need for full decompression. In order to study the compression performance of XCQ, we carry out comprehensive experiments over a set of XML benchmark datasets.
- Subjects
XML (Extensible Markup Language); DOCUMENT markup languages; DATA compression; ELECTRONIC data interchange; WORLD Wide Web
- Publication
Knowledge & Information Systems, 2006, Vol 10, Issue 4, p421
- ISSN
0219-1377
- Publication type
Article
- DOI
10.1007/s10115-006-0012-z