Compiling XQuery to Native Machine Code
It has become a trend amongst modern high-performance databases to optimise queries by avoiding interpretation of the query language. They instead opt to compile queries to native machine code which can subsequently be executed directly by one or more CPUs and/or GPUs. Both Saxon and XSLTC have previously demonstrated, albeit by different means, compilation of XSLT and XQuery to Java bytecode which can then be executed by the Java Virtual Machine.
To the best of our knowledge, we are the first group to demonstrate the compilation of XQuery directly to native machine code. As part of a research effort to develop a new performant XQuery processor for our poly-store database (FusionDB), we are constructing an XQuery parser which emits LLVM IR (Intermediate Representation), and a JIT (just-in-time) compiler to produce native machine code which is then executed.
In this paper we review the current approaches to native query compilation, detail our challenges and progress in building a modern XQuery parser, and our use of LLVM for compiling XQuery to native machine code. We also consider how we might exploit LLVM's low-level IR optimizations to improve query performance.
Adam Retter has been a core contributor to the Open Source eXist-db Native XML Database for 15 years, he was also an invited expert to the W3C XQuery Working Group and helped standardise XQuery 1.0, 3.0, and 3.1. Adam founded the EXQuery project, and developed the RESTXQ framework for XQuery. Recently, Adam has been developing FusionDB a new multi-model NoSQL database which also supports XML natively.