Program
Declarative Amsterdam 2021
Nov 4: Tutorial day, Nov 5: SymposiumLocation: Congress center, Science Park 125, Amsterdam
Thu 4 November: Tutorial Day
XProc - A pipelining language
XProc is an XML based programming language for complex document processing. Documents flow through pipelines in which steps perform processing like conversion, validation, split, merge, report, etc. It’s an almost perfect fit for the kind of processing necessary in complex document engineering.
In 2016 a W3C community group started working on XProc 3.0 to replace the never very popular 1.0 version (the 2.0 proposal never made it). Main goals were to make the language much more usable, understandable and concise, update the underlying standards (most notably XPath) and allow processing of non-XML documents as well.
The XProc 3.0 core specification has been stable for over a year now. There is one functioning processor (MorganaXProc-IIIse by Achim Berndzen) and one in the making (XML Calabash 3.0 by Norman Tovey-Walsh). There is a book (XProc 3.0: A Programmer Reference by Erik Siegel) that describes the language.
This tutorial covers the basics of XProc 3.0. Participants that are in for the hands-on exercises: please download MorganaXproc-IIIse and try to flight-test it.
For more information and, important, instructions for preparing for the tutorial, visit the tutorial's GitHub pages.
Erik Siegel (http://www.xatapult.nl/) works as a content engineer, XML specialist and technical writer. His main customers are in publishing and standardization. He is a member of the XProc 3.0 editorial committee.
SaxonJS Tutorial
SaxonJS is an XSLT 3.0 processor written in JavaScript and XSLT. It offers all of the traditional declarative features of XSLT in any modern browser and, on the server side, in Node.js. This tutorial will explain how to setup and use SaxonJS. We’ll cover the interactive extensions that make SaxonJS a powerful platform for developing browser-based applications. We’ll also explore how to use it on Node.js for traditional server-side automation tasks. Participants will be guided through a series of hands-on sessions where they will experience first hand how easy and fun it is to build applications with SaxonJS.
Norm Tovey-Walsh is a Senior Software Developer at Saxonica. He has also been an active participant in international standards efforts at both the W3C and OASIS. At the W3C, Norm was chair of the XML Processing Model Working Group, co-chair of the XML Core Working Group, and an editor in the XQuery and XSLT Working Groups. He served for several years as an elected member of the Technical Architecture Group. At OASIS, he was chair of the DocBook Technical Committee for many years and is the author of DocBook: The Definitive Guide. Norm has spent more than twenty years developing commercial and open source software.
Debbie Lockett joined Saxonica back in early 2014 in the days of Saxon 9.6; when XPath 3.0 and XQuery 3.0 were brand new, and XSLT 3.0 was approaching "last call working draft" status. She had no idea what any of these things meant, and has learned everything she knows about software development and XML technologies while at Saxonica. Debbie previously worked as a post-doctoral researcher in Mathematics at the University of Leeds, writing papers on symmetries of infinite relational structures, and once taught an undergraduate course to a class of 200 students. Debbie has worked on SaxonJS since its inception in 2016, and is now a lead developer.
Hands-on ixml
We choose which representations of our data to use, JSON, CSV, XML, or whatever, depending on habit, convenience, or the context we want to use that data in. On the other hand, having an interoperable generic toolchain such as that provided by XML to process data is of immense value. How do we resolve the conflicting requirements of convenience, habit, and context, and still enable a generic toolchain? Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content. For example, it can turn CSS code like
body {color: blue; font-weight: bold}
into XML like
<css>
<rule>
<simple-selector name="body"/>
<block>
<property>
<name>color</name>
<value>blue</value>
</property>
<property>
<name>font-weight</name>
<value>bold</value>
</property>
</block>
</rule>
</css>
or
<css>
<rule>
<selector>body</selector>
<block>
<property name="color" value="blue"/>
<property name="font-weight" value="bold"/>
</block>
</rule>
</css>
depending on choice. More details at invisiblexml.org. This tutorial provides a hands-on introduction to ixml: how to specify how documents are transformed into XML, and what choices you have.
Fri 5 November: Symposium
RumbleDB: Data independence for large, messy datasets
We introduce Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine, Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.
Abbreviations used in this presentation:
CLI: Command-Line Interface
CSV: Comma-Separated Values
DAG: Directed Acyclic Graph (a graph with no directed cycles, also known as dependency graph)
ETL: Extract-Transform-Load, to import data in a database
FLWOR: for-let-where-orderby-return (pronounced as 'flower')
HDFS: Hadoop Distributed File System, it is an open-source framework for storing very large datasets on a cluster.
HTTP: Hypertext Transfer Protocol, the basis of the Web
JSON: JavaScript Object Notation
RDD: Resilient Distributed Dataset, Spark's data primitive
ROOT: CERN's native format for high-energy physics data.
S3: Simple Storage Service, Amazon's cloud storage service
SQL: Structured english Query Language
UDF: User-Defined Function
Ghislain Fourny is a senior scientist at ETH Zurich with a focus on databases and game theory. He holds a Master of Science in Computer Science and a Doctorate of Science from ETH Zürich. Ghislain teaches Big Data courses for computer scientists as well as non-computer scientists. His research interests cover query languages for large-scale, heterogeneous, nested datasets, as well as rebooting game theory with a non-Nashian form of free choice. Ghislain was a member of the W3C XML Query working group from 2011 to 2014 and is a co-designer of the JSONiq query language and of the Rumble engine.
Features of a modern XML Resolver
XML Resolvers are a core extension feature in XML parsers and other applications in the XML stack. They allow you to transparently satisfy requests for DTDs, schemas, stylesheet modules, etc., with local copies of those resources. This offers improvements in both performance and security. XML Resolver 3.0, available in Java and (soon!) C#, provides full support for the XML Catalogs standard and a broad range of features designed to make deploying and using catalog-based resolution faster and easier. This talk will highlight the new features of the resolver including:
- Dynamic catalog construction with caching.
- Automatically loading catalogs from extension modules (jar files or assemblies).
- Improved support for resources distributed in extension modules.
- Handling http: and https: entries transparently.
- Validation of catalog files.
- Namespace-based resource discovery by indirection through RDDL documents.
Norm Tovey-Walsh is a Senior Software Developer at Saxonica. He has also been an active participant in international standards efforts at both the W3C and OASIS. At the W3C, Norm was chair of the XML Processing Model Working Group, co-chair of the XML Core Working Group, and an editor in the XQuery and XSLT Working Groups. He served for several years as an elected member of the Technical Architecture Group. At OASIS, he was chair of the DocBook Technical Committee for many years and is the author of DocBook: The Definitive Guide. Norm has spent more than twenty years developing commercial and open source software.
Roaster - declarative routing for eXist-db
Declarative approach to routing requests in eXist-db
Juri Leino is a software gardener from Berlin with over 15 years of experience in web development. In most recent years he has joined the exist-db project as a core developer focussing on the XQuery runtime. Next to consulting for exist-solutions and jinntec he also maintains and develops node-exist and gulp-exist and created XQuery libraries like xbow, exist-jwt and dicey.
Extracting Microcontent from DITA Topics
Chris Despopoulos is an old hand at technical writing. He is currently Publications Manager at Turbonomic Inc. In this role, he works with a small team that uses Git to manage DITA source, and a number of home-grown processes that exploit DITA to manage docs as code, harvest content from source code (for the API docs), produce release notes, integrate with markdown and other formats, and rebrand the pubs product for a matrix of Agile teams and projects. Tired of waiting for the “experts” to do it, he designed and implemented 4D Pubs, a single-page app for online help… Static site, dynamic client.
Functional, Declarative Audio Applications
Audio software, and particularly digital signal processing, is an application domain where the imperative, object oriented programming model dominates. In part, this can be justified by the realtime constraints that underly the domain, and that C/C++ has historically dominated the high performance native software landscape. But this is not without cost: the high barrier to entry prevents developers from trying to write audio software, and the industry spends far more time than needed to deliver new products.
In this talk, we'll look at some of the complications that come from writing low level native audio software in C/C++ with an imperative, object oriented model. Then we'll reframe the conversation to show why a functional, declarative approach may be fundamentally more fitting for the problems we want to tackle when writing new audio software.
Finally, I'll introduce Elementary Audio: a new JavaScript runtime for writing realtime, native audio applications with a functional, declarative API. We'll see how Elementary applies the declarative model to audio software, and then finish with a detailed example of a small drum synthesis library written in Elementary.
SchemaCom - An XML Schema Comparator
Ihe Onwuka has been working with XML since 2003 and is a System Engineer with LS Technologies assisting in the development and architecting of large complex data models for the US federal government. He is a great believer in functional programming and declarative technologies in general. Hobby wise he enjoys street dance choreography and had a long rugby career during which he played in 4 of the 5 continents and only retired because none of the professional clubs came in with an offer big enough to entice him to carry on.
Aparecium, an XQuery / XSLT parser library for invisible XML
'Invisible XML' ('ixml') is a method for treating non-XML information as if it were XML; it was proposed by Steven Pemberton in 2013. The basic idea is straightforward: a context-free grammar is used to describe the structure of the information, annotations in the grammar specify how the raw parse tree of a sentence in the language described by the grammar is to be represented into XML, and an ixml parser uses the grammar to parse the non-XML document into an XML form. This allows all the tools of the XML toolbox to be applied to the data: XQuery and XSLT for general processing, XForms for creating user interfaces to the data, XML schema languages for validation, and so on.
Aparecium is an ixml parser written in XQuery and XSLT, as a library of functions callable from XQuery and XSLT. (The name is a reference to a spell in the Harry Potter novels, which makes invisible writing visible.) When used to parse external resources, Aparecium can be thought of as a replacement for the standard doc() function which can read non-XML data and deliver it as XML; it can also be used to parse strings which obey a context-free grammar, such as CSS style specifications, XSLT pattern expressions, SVG path expressions, and so on. The latter makes Aparecium useful for handling XML formats which use micro-grammars for some portions of documents.
For simplicity, Aparecium is implemented as a pipeline of processes. First the extended BNF notation allowed in ixml grammars is translated to an equivalent unextended BNF. This grammar is then used by an Earley parser to parse the input; the result is a large set of 'Earley items' describing various aspects of the parse. From the set of Earley items, Aparecium then constructs a 'parse-forest grammar' describing the set of parse trees in the input. As a final step, a parse tree is extracted from the parse-forest grammar and returned to the caller. Alternate interfaces may be used to specify that the parse-forest grammar should be returned, instead; this may be helpful in cases of ambiguity, since it allows the caller to study the ambiguity and in some cases to extract the preferred parse tree.
In some cases the caller will have the grammar in the non-XML form described in the ixml specification; in others, the grammar will be available as an XML document; sometimes the caller will have a URI for the grammar. The input may similarly be available either as a string or as a URI. Aparecium provides distinct calls for each of these situations, to simplify the use of Aparecium in constructing applications.
The talk will briefly describe the current status of Aparecium implementation and (the gods willing) show a simple demo; it will conclude with a discussion of some next steps in the work on Aparecium and in the development of broader support for invisible XML.
C. M. Sperberg-McQueen is the founder of Black Mesa Technologies LLC, a consultancy specializing in the use of descriptive markup to help memory institutions preserve cultural heritage information. He co-edited the XML 1.0 specification, the Guidelines of the Text Encoding Initiative, and the XML Schema Definition Language (XSDL) 1.1 specification.
Declarative is a Feminist Issue
Betsy Haibel is a San Francisco-based engineering leader with over a decade of experience. She writes fiction and non-fiction in English and a variety of programming languages, and prior to the pandemic co-organized the Learn Ruby in DC meetup.