RadegastXDB - Prototype of Native XML Database Management System: Technical Report
Petr Luk\'a\v{s}, Radim Ba\v{c}a, Michal Kr\'atk\'y

TL;DR
This paper presents RadegastXDB, a native XML database system that incorporates twig pattern query detection to improve the efficiency of processing structural XQueries, outperforming existing XML DBMSs especially on large datasets.
Contribution
Introduction of RadegastXDB, a prototype XML DBMS that integrates twig pattern query detection and advanced algorithms to enhance query processing performance.
Findings
RadegastXDB outperforms current XML DBMSs on structural queries.
State-of-the-art TPQ algorithms improve query speed on large datasets.
Efficient processing of queries with value predicates using the proposed techniques.
Abstract
A lot of advances in the processing of XML data have been proposed in last two decades. There were many approaches focused on the efficient processing of twig pattern queries (TPQ). However, including the TPQ into an XQuery compiler is not a straightforward task and current XML DBMSs process XQueries without any TPQ detection. In this paper, we demonstrate our prototype of a native XML DBMS called RadegastXDB that uses a TPQ detection to accelerate structural XQueries. Such a detection allows us to utilize state-of-the-art TPQ processing algorithms. Our experiments show that, for the structural queries, these algorithms and state-of-the-art XML indexing techniques make our prototype faster than all of the current XML DBMSs, especially for large data collections. We also show that using the same techniques is also efficient for the processing of queries with value predicates.
|
Oracle Berkley DB 6.1.4 (B-DB)
www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index.html |
|---|
|
Virtuoso 7.1 (VRT)
virtuoso.openlinksw.com |
|
eXist-db 4.3.1 (E-DB)
exist-db.org |
|
BaseX 9.0.2 (BX)
basex.org |
|
MonetDB XQuery 4 (M-DB)
www.monetdb.org/XQuery |
| Commercial XML DBMS (CX) |
| Commercial relational DBMS 1 and 2 (CR1, CR2) |
| Collection | Size (MB) | XML nodes | Max. depth |
|---|---|---|---|
| XMark (f=1) | 111 | 2,048,193 | 14 |
| XMark (f=10) | 1,137 | 20,532,805 | 14 |
| SwissProt | 109 | 5,166,890 | 7 |
| TreeBank | 82 | 2,437,667 | 38 |
| DBLP | 127 | 3,736,406 | 6 |
| Structural queries | Queries with value predicates | ||||||||||||||||
| GTP | CTJ | FP-BJ | B-DB | VRT | BX | M-DB | CR1 | CR2 | GTP | FP-BJ | B-DB | VRT | BX | M-DB | CR1 | CR2 | |
| XM1 | 0.010 | 0.011 | 0.002 | 0.265 | 4.422 | 0.112 | 0.030 | 63.655 | DNF | 0.004 | 0.008 | 0.474 | 4.213 | 0.347 | 0.145 | 93.868 | DNF |
| XM2 | 0.041 | 0.041 | 0.016 | 0.870 | 4.979 | 0.358 | 0.094 | DNF | DNF | 0.023 | 0.023 | 0.193 | 4.156 | 0.981 | 0.111 | 9.710 | DNF |
| XM3 | 0.007 | 0.003 | 0.010 | 0.078 | 4.318 | 0.013 | 0.036 | 0.054 | 0.031 | 0.000 | 0.000 | 0.021 | 4.094 | 0.004 | 0.139 | 2.140 | DNF |
| XM4 | 0.041 | 0.041 | 0.016 | 0.740 | 4.922 | 0.137 | 0.062 | DNF | DNF | 0.010 | 0.010 | 0.084 | 3.984 | 2.508 | 0.134 | 3.510 | DNF |
| XM5 | 0.034 | 0.033 | 0.008 | 0.563 | 4.630 | 0.223 | 0.045 | 223.361 | DNF | 0.006 | 0.002 | 0.333 | 3.990 | 0.168 | 0.256 | DNF | DNF |
| XM1 | 0.108 | 0.105 | 0.025 | 2.427 | 58.573 | 0.972 | 0.106 | DNF | DNF | 0.031 | 0.068 | 9.120 | 50.057 | 3.479 | 0.665 | DNF | DNF |
| XM2 | 0.423 | 0.423 | 0.141 | 8.450 | 66.136 | 3.079 | 0.621 | DNF | DNF | 0.236 | 0.211 | 1.677 | 51.078 | 3.377 | 0.548 | DNF | DNF |
| XM3 | 0.071 | 0.032 | 0.078 | 0.250 | 59.500 | 0.087 | 0.063 | DNF | 0.250 | 0.006 | 0.000 | 0.021 | 50.141 | 0.005 | 0.528 | 16.494 | DNF |
| XM4 | 0.436 | 0.426 | 0.141 | 8.146 | 71.620 | 1.328 | 0.355 | DNF | DNF | 0.121 | 0.117 | 0.370 | 51.349 | 9.451 | 0.516 | 253.375 | DNF |
| XM5 | 0.338 | 0.337 | 0.074 | 5.468 | 66.354 | 1.857 | 0.195 | DNF | DNF | 0.041 | 0.041 | 5.292 | 51.588 | 1.547 | 1.073 | DNF | DNF |
| SP1 | 0.019 | 0.006 | 0.016 | 0.042 | 9.828 | 1.030 | 0.095 | DNF | 10.083 | 0.000 | 0.000 | 0.021 | 9.828 | 0.002 | 0.166 | 6.197 | DNF |
| SP2 | 0.210 | 0.199 | 0.100 | 3.943 | 15.156 | 1.708 | 0.170 | DNF | DNF | 0.002 | 0.002 | 3.500 | 10.224 | 1.103 | 0.254 | 0.434 | DNF |
| SP3 | 0.042 | 0.042 | 0.020 | 1.656 | 10.375 | 0.675 | 0.136 | DNF | DNF | 0.063 | 0.055 | 4.933 | 10.135 | 1.280 | DNF | DNF | DNF |
| SP4 | 0.021 | 0.012 | 0.016 | 0.062 | 9.719 | 0.574 | 0.052 | 19.716 | 0.297 | 0.000 | 0.000 | 1.026 | 9.776 | 1.521 | 0.573 | 1.303 | DNF |
| SP5 | 0.338 | 0.306 | 0.156 | 9.073 | 16.797 | 2.821 | 0.150 | DNF | DNF | 0.041 | 0.020 | 0.313 | 9.411 | 0.276 | 0.963 | 54.766 | DNF |
| TB1 | 0.012 | 0.002 | 0.012 | 0.469 | 6.073 | DNF | 0.048 | DNF | 130.187 | ||||||||
| TB2 | 0.016 | 0.014 | 0.016 | 78.651 | 6.375 | DNF | 0.091 | DNF | DNF | ||||||||
| TB3 | 0.012 | 0.012 | 0.016 | 0.766 | 5.255 | DNF | 0.059 | DNF | DNF | ||||||||
| TB4 | 0.068 | 0.068 | 0.018 | 0.740 | 5.359 | DNF | 0.058 | DNF | 0.109 | ||||||||
| TB5 | 0.099 | 0.100 | 0.043 | 6.136 | 6.672 | DNF | 0.088 | DNF | DNF | ||||||||
| DB1 | 0.066 | 0.064 | 0.031 | 3.693 | 10.089 | 1.435 | 0.084 | DNF | DNF | 0.012 | 0.006 | 8.338 | 9.214 | 0.002 | 0.154 | 61.869 | DNF |
| DB2 | 0.011 | 0.011 | 0.010 | 0.047 | 8.672 | 0.511 | 0.035 | 14.162 | 0.141 | 0.000 | 0.000 | 0.922 | 8.870 | 0.028 | 0.208 | 26.061 | DNF |
| DB3 | 0.022 | 0.021 | 0.016 | 1.875 | 9.031 | 0.881 | 0.043 | DNF | DNF | 0.016 | 0.008 | 6.000 | 9.011 | 0.804 | 0.269 | 28.120 | DNF |
| DB4 | 0.188 | 0.184 | 0.059 | 5.031 | 10.036 | 1.741 | 0.227 | DNF | DNF | 0.010 | 0.012 | 3.521 | 9.078 | 0.005 | 0.208 | 4.603 | DNF |
| DB5 | 0.040 | 0.039 | 0.014 | 0.411 | 9.015 | 1.081 | 0.061 | DNF | 7.630 | 0.016 | 0.018 | 1.531 | 9.302 | 0.112 | 0.234 | 176.474 | DNF |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Mining Algorithms and Applications
RadegastXDB – Prototype of Native XML Database Management System: Technical Report
Petr Lukáš [email protected]
Radim Bača [email protected]
Michal Krátký [email protected]
Department of Computer Science
Faculty of Electrical Engineering and Computer Science
VSB – Technical University of Ostrava
Abstract
A lot of advances in the processing of XML data have been proposed in last two decades. There were many approaches focused on the efficient processing of twig pattern queries (TPQ). However, including the TPQ into an XQuery compiler is not a straightforward task and current XML DBMSs process XQueries without any TPQ detection. In this paper, we demonstrate our prototype of a native XML DBMS called RadegastXDB that uses a TPQ detection to accelerate structural XQueries. Such a detection allows us to utilize state-of-the-art TPQ processing algorithms. Our experiments show that, for the structural queries, these algorithms and state-of-the-art XML indexing techniques make our prototype faster than all of the current XML DBMSs, especially for large data collections. We also show that using the same techniques is also efficient for the processing of queries with value predicates.
1 Introduction
A lot of advances in the processing of XML data have been proposed in last two decades. Especially in 2000 – 2010, there were many approaches focused on an efficient processing of XQueries modeled by twig pattern queries (TPQ) (e.g., [19, 2, 6, 16, 7, 15, 14, 17]). In general, there are two major groups of TPQ processing algorithms: binary structural joins [2, 1, 16, 10] and holistic twig joins [6, 7, 14, 4], where the latter group is considered as the state-of-the-art. However, the most of the current XML database management systems (DBMSs) do not utilize holistic twig joins, since these DBMSs are not capable to detect TPQs in XQueries. Instead, they rely on rather naive techniques such as nested loops or the traditional relational merge join algorithms. In other words, they ignore the most of the advances in the XML query processing introduced in last two decades and, therefore, they perform poorly even on simple structural queries on large data collections.
A TPQ is a rooted labeled tree, where each node corresponds to one location step in an XQuery. A sample TPQ is illustrated in Figure 1b and it corresponds to the XQuery in Figure 1a. The single and double lined edges represent the parent-child (PC) and ancestor-descendant (AD) structural relationships corresponding to the child and descendant axes, respectively111For the sake of simplicity, we consider only these two XPath axes as far as it is common for the most of the XML query processing approaches.. We call query nodes the nodes in a TPQ and we denote them by the ‘#’ character. Additionally, the circled query nodes represent output query nodes (also called extraction points [12]) which correspond to the last location steps in the ‘for’ clauses. In a nutshell, the processing of a TPQ means to find all mappings from the TPQ to an XML document such that the query nodes are mapped to XML nodes of the corresponding name and these XML nodes satisfy the relationships specified by the query edges. For more details about the processing of a TPQ, we refer to [3].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Al-Khalifa and H. Jagadish. Multi-level operator combination in xml query processing. In Proceedings of the eleventh international conference on Information and knowledge management , pages 134–141. ACM, 2002.
- 2[2] S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. M. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient xml query pattern matching. In Proceedings 18th International Conference on Data Engineering , pages 141–152. IEEE, 2002.
- 3[3] R. Bača, M. Krátký, I. Holubová, M. Nečaský, T. Skopal, M. Svoboda, and S. Sakr. Structural xml query processing. ACM Computing Surveys (CSUR) , 50(5):64, 2017.
- 4[4] R. Bača, M. Krátký, T. W. Ling, and J. Lu. Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings. The VLDB Journal—The International Journal on Very Large Data Bases , 22(3):369–393, 2013.
- 5[5] R. Bača, P. Lukáš, and M. Krátký. Cost-based holistic twig joins. Information Systems , 52:21–33, 2015.
- 6[6] N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal xml pattern matching. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data , pages 310–321. ACM, 2002.
- 7[7] S. Chen, H.-G. Li, J. Tatemura, W.-P. Hsiung, D. Agrawal, and K. S. Candan. Twig 2 stack: bottom-up processing of generalized-tree-pattern queries over xml documents. In Proceedings of the 32nd international conference on Very large data bases , pages 283–294. VLDB Endowment, 2006.
- 8[8] Z. Chen, H. Jagadish, L. V. Lakshmanan, and S. Paparizos. From tree patterns to generalized tree patterns: On efficient evaluation of xquery. In Proceedings of the 29th international conference on Very large data bases-Volume 29 , pages 237–248. VLDB Endowment, 2003.
