Program

Declarative Amsterdam 2021

Nov 4: Tutorial day, Nov 5: Symposium

Location: Congress center, Science Park 125, Amsterdam

Thu 4 November: Tutorial Day

XProc - A pipelining language

Erik SiegelXatapult Content Engineering

XProc is an XML based programming language for complex document processing. Documents flow through pipelines in which steps perform processing like conversion, validation, split, merge, report, etc. It’s an almost perfect fit for the kind of processing necessary in complex document engineering.

In 2016 a W3C community group started working on XProc 3.0 to replace the never very popular 1.0 version (the 2.0 proposal never made it). Main goals were to make the language much more usable, understandable and concise, update the underlying standards (most notably XPath) and allow processing of non-XML documents as well.

The XProc 3.0 core specification has been stable for over a year now. There is one functioning processor (MorganaXProc-IIIse by Achim Berndzen) and one in the making (XML Calabash 3.0 by Norman Tovey-Walsh). There is a book (XProc 3.0: A Programmer Reference by Erik Siegel) that describes the language.

This tutorial covers the basics of XProc 3.0. Participants that are in for the hands-on exercises: please download MorganaXproc-IIIse and try to flight-test it.

For more information and, important, instructions for preparing for the tutorial, visit the tutorial's GitHub pages.

Erik Siegel (http://www.xatapult.nl/) works as a content engineer, XML specialist and technical writer. His main customers are in publishing and standardization. He is a member of the XProc 3.0 editorial committee.

SaxonJS Tutorial

Norm Tovey-Walsh and Debbie LockettSaxonica

SaxonJS is an XSLT 3.0 processor written in JavaScript and XSLT. It offers all of the traditional declarative features of XSLT in any modern browser and, on the server side, in Node.js. This tutorial will explain how to setup and use SaxonJS. We’ll cover the interactive extensions that make SaxonJS a powerful platform for developing browser-based applications. We’ll also explore how to use it on Node.js for traditional server-side automation tasks. Participants will be guided through a series of hands-on sessions where they will experience first hand how easy and fun it is to build applications with SaxonJS.

Norm Tovey-Walsh is a Senior Software Developer at Saxonica. He has also been an active participant in international standards efforts at both the W3C and OASIS. At the W3C, Norm was chair of the XML Processing Model Working Group, co-chair of the XML Core Working Group, and an editor in the XQuery and XSLT Working Groups. He served for several years as an elected member of the Technical Architecture Group. At OASIS, he was chair of the DocBook Technical Committee for many years and is the author of DocBook: The Definitive Guide. Norm has spent more than twenty years developing commercial and open source software.

Debbie Lockett joined Saxonica back in early 2014 in the days of Saxon 9.6; when XPath 3.0 and XQuery 3.0 were brand new, and XSLT 3.0 was approaching "last call working draft" status. She had no idea what any of these things meant, and has learned everything she knows about software development and XML technologies while at Saxonica. Debbie previously worked as a post-doctoral researcher in Mathematics at the University of Leeds, writing papers on symmetries of infinite relational structures, and once taught an undergraduate course to a class of 200 students. Debbie has worked on SaxonJS since its inception in 2016, and is now a lead developer.

Hands-on ixml

Steven PembertonCWI

We choose which representations of our data to use, JSON, CSV, XML, or whatever, depending on habit, convenience, or the context we want to use that data in. On the other hand, having an interoperable generic toolchain such as that provided by XML to process data is of immense value. How do we resolve the conflicting requirements of convenience, habit, and context, and still enable a generic toolchain? Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content. For example, it can turn CSS code like

body {color: blue; font-weight: bold}

into XML like

<css>
<rule>
<simple-selector name="body"/>
<block>
<property>
<name>color</name>
<value>blue</value>
  </property>
  <property>
<name>font-weight</name>
<value>bold</value>
  </property>
  </block>
</rule>
</css>

depending on choice. More details at invisiblexml.org. This tutorial provides a hands-on introduction to ixml: how to specify how documents are transformed into XML, and what choices you have.

Steven Pemberton is a researcher affiliated with CWI Amsterdam, the Dutch national research centre for mathematics and informatics. His research is in interaction, and how the underlying software architecture can support users. He co-designed the ABC programming language that formed the basis for Python. Involved with the Web from the beginning, he organised two workshops at the first Web Conference in 1994. For the best part of a decade he chaired the W3C HTML working group, and has co-authored many web standards, including HTML, XHTML, CSS, XForms and RDFa. He now chairs the XForms group at W3C.

Fri 5 November: Symposium

RumbleDB: Data independence for large, messy datasets

Ghislain FournyETH Zürich

We introduce Rumble, a query execution engine for large, heterogeneous, and nested collections of JSON objects built on top of Apache Spark. While data sets of this type are more and more wide-spread, most existing tools are built around a tabular data model, creating an impedance mismatch for both the engine and the query interface. In contrast, Rumble uses JSONiq, a standardized language specifically designed for querying JSON documents. The key challenge in the design and implementation of Rumble is mapping the recursive structure of JSON documents and JSONiq queries onto Spark's execution primitives based on tabular data frames. Our solution is to translate a JSONiq expression into a tree of iterators that dynamically switch between local and distributed execution modes depending on the nesting level. By overcoming the impedance mismatch in the engine, Rumble frees the user from solving the same problem for every single query, thus increasing their productivity considerably. As we show in extensive experiments, Rumble is able to scale to large and complex data sets in the terabyte range with a similar or better performance than other engines. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.

Abbreviations used in this presentation:

CLI: Command-Line Interface
CSV: Comma-Separated Values
DAG: Directed Acyclic Graph (a graph with no directed cycles, also known as dependency graph)
ETL: Extract-Transform-Load, to import data in a database
FLWOR: for-let-where-orderby-return (pronounced as 'flower')
HDFS: Hadoop Distributed File System, it is an open-source framework for storing very large datasets on a cluster.
HTTP: Hypertext Transfer Protocol, the basis of the Web
JSON: JavaScript Object Notation
RDD: Resilient Distributed Dataset, Spark's data primitive
ROOT: CERN's native format for high-energy physics data.
S3: Simple Storage Service, Amazon's cloud storage service
SQL: Structured english Query Language
UDF: User-Defined Function

Ghislain Fourny is a senior scientist at ETH Zurich with a focus on databases and game theory. He holds a Master of Science in Computer Science and a Doctorate of Science from ETH Zürich. Ghislain teaches Big Data courses for computer scientists as well as non-computer scientists. His research interests cover query languages for large-scale, heterogeneous, nested datasets, as well as rebooting game theory with a non-Nashian form of free choice. Ghislain was a member of the W3C XML Query working group from 2011 to 2014 and is a co-designer of the JSONiq query language and of the Rumble engine.

Features of a modern XML Resolver

Norm Tovey-WalshSaxonica

XML Resolvers are a core extension feature in XML parsers and other applications in the XML stack. They allow you to transparently satisfy requests for DTDs, schemas, stylesheet modules, etc., with local copies of those resources. This offers improvements in both performance and security. XML Resolver 3.0, available in Java and (soon!) C#, provides full support for the XML Catalogs standard and a broad range of features designed to make deploying and using catalog-based resolution faster and easier. This talk will highlight the new features of the resolver including:

Dynamic catalog construction with caching.
Automatically loading catalogs from extension modules (jar files or assemblies).
Improved support for resources distributed in extension modules.
Handling http: and https: entries transparently.
Validation of catalog files.
Namespace-based resource discovery by indirection through RDDL documents.

Roaster - declarative routing for eXist-db

Juri LeinoeXist-solutions

Declarative approach to routing requests in eXist-db

A brief introduction into the status quo, followed by a presentation of a new approach to declaratively design APIs and route requests with examples for several use cases.

Introduction

I will explain the basics of routing in general.

After that, we will have a look at the status quo of routing requests in eXist-db.

In particular using rest, RestXQ and the controller.xq and their pros and cons.

- rest

- has some quirks

- does not encourage RESTful interface best-practices

- can be hard to secure

- restXQ

- route handlers can be somewhere in a package

- can lead to duplicate code for multiple output formats

- parameter handling authentication and error handling is left to the user

- controller.xq

- can be hard to secure

- parameter, authentication and error handling is left to the user

- complex controllers can get hard to read

- can only pass strings as parameters to handlers

History

Because of said limitations of all routing options on exist-db I made several attempts to come up with a better solution that maps a route to a function.

2019 I had a series of small breakthroughs and a working prototype that roughly modelled after the express router known from nodeJS.

By mid 2020 Wolfgang Meier expressed the need for a better routing option to use with TEI-publisher. I showed him what I got and he ran with it.

He had the brilliant idea to implement the OpenAPI standard and thus created a router where you first create the documentation. You declare which routes exist and what they expect and return.

In this configuration you also set things like headers, mime-types and more.

This ongoing collaboration is now part of e-editiones, the same society that governs TEI-publisher.

Hands-On

At the beginning we will have a look at an example JSON file that declares a simple API of an exist-db package.

1. Using a test page created from our declaration

2. Looking at the JSON file itself

Then we will create a new route that will output different formats like (HTML, XML, JSON, CSV).

I will show how to set arbitrary headers per route, in a handler function, dynamically, for cacheing and also using a middleware for all routes.

To round things up, how to secure routes with cookies and basic auth, how to handle authorisation of requests and how to use a custom authentication method.

What's next?

What are our medium and long-term goals and how can you contribute.

Juri Leino is a software gardener from Berlin with over 15 years of experience in web development. In most recent years he has joined the exist-db project as a core developer focussing on the XQuery runtime. Next to consulting for exist-solutions and jinntec he also maintains and develops node-exist and gulp-exist and created XQuery libraries like xbow, exist-jwt and dicey.

Extracting Microcontent from DITA Topics

Chris Despopoulos

We transform DITA to HTML in the browser via a Single Page App (SPA). Our product GUI is also an SPA – Why not put our content directly in the GUI? This talk shows the advantages of dynamic transforms, and how we use that technique to extract subsets from the single source and display them in the product.

Chris Despopoulos is an old hand at technical writing. He is currently Publications Manager at Turbonomic Inc. In this role, he works with a small team that uses Git to manage DITA source, and a number of home-grown processes that exploit DITA to manage docs as code, harvest content from source code (for the API docs), produce release notes, integrate with markdown and other formats, and rebrand the pubs product for a matrix of Agile teams and projects. Tired of waiting for the “experts” to do it, he designed and implemented 4D Pubs, a single-page app for online help… Static site, dynamic client.

Functional, Declarative Audio Applications

Nick Thompson

Audio software, and particularly digital signal processing, is an application domain where the imperative, object oriented programming model dominates. In part, this can be justified by the realtime constraints that underly the domain, and that C/C++ has historically dominated the high performance native software landscape. But this is not without cost: the high barrier to entry prevents developers from trying to write audio software, and the industry spends far more time than needed to deliver new products.

In this talk, we'll look at some of the complications that come from writing low level native audio software in C/C++ with an imperative, object oriented model. Then we'll reframe the conversation to show why a functional, declarative approach may be fundamentally more fitting for the problems we want to tackle when writing new audio software.

Finally, I'll introduce Elementary Audio: a new JavaScript runtime for writing realtime, native audio applications with a functional, declarative API. We'll see how Elementary applies the declarative model to audio software, and then finish with a detailed example of a small drum synthesis library written in Elementary.

Nick Thompson is an audio software developer, contractor, and consultant. He is the owner of a small audio plugin company, Creative Intent, and the author of Elementary Audio and React-JUCE. Nick's interest lies in tools that enable and promote creativity and simplicity, both in music making and in software development.

SchemaCom - An XML Schema Comparator

Ihe Onwuka

People working with large XML vocabularies often face the task of upgrading to a new version [1]. Ideally such are guided with a specification of how to map the components of an XML instance to the new version of the vocabulary. SchemaCom was created to assist situations where such a specification is not available. It highlights the differences (and similarities) between the constituent content models in the respective vocabularies, this information can then guide the analysis necessary to specify the missing mappings and can be applied without loss of generality between schemas representing different vocabularies (as opposed to different versions of the same one). A distinguishing feature is the delivery of the user interface as an XForm.

Ihe Onwuka has been working with XML since 2003 and is a System Engineer with LS Technologies assisting in the development and architecting of large complex data models for the US federal government. He is a great believer in functional programming and declarative technologies in general. Hobby wise he enjoys street dance choreography and had a long rugby career during which he played in 4 of the 5 continents and only retired because none of the professional clubs came in with an offer big enough to entice him to carry on.

Aparecium, an XQuery / XSLT parser library for invisible XML

C. M. Sperberg-McQueenBlack Mesa Technologies

'Invisible XML' ('ixml') is a method for treating non-XML information as if it were XML; it was proposed by Steven Pemberton in 2013. The basic idea is straightforward: a context-free grammar is used to describe the structure of the information, annotations in the grammar specify how the raw parse tree of a sentence in the language described by the grammar is to be represented into XML, and an ixml parser uses the grammar to parse the non-XML document into an XML form. This allows all the tools of the XML toolbox to be applied to the data: XQuery and XSLT for general processing, XForms for creating user interfaces to the data, XML schema languages for validation, and so on.

Aparecium is an ixml parser written in XQuery and XSLT, as a library of functions callable from XQuery and XSLT. (The name is a reference to a spell in the Harry Potter novels, which makes invisible writing visible.) When used to parse external resources, Aparecium can be thought of as a replacement for the standard doc() function which can read non-XML data and deliver it as XML; it can also be used to parse strings which obey a context-free grammar, such as CSS style specifications, XSLT pattern expressions, SVG path expressions, and so on. The latter makes Aparecium useful for handling XML formats which use micro-grammars for some portions of documents.

For simplicity, Aparecium is implemented as a pipeline of processes. First the extended BNF notation allowed in ixml grammars is translated to an equivalent unextended BNF. This grammar is then used by an Earley parser to parse the input; the result is a large set of 'Earley items' describing various aspects of the parse. From the set of Earley items, Aparecium then constructs a 'parse-forest grammar' describing the set of parse trees in the input. As a final step, a parse tree is extracted from the parse-forest grammar and returned to the caller. Alternate interfaces may be used to specify that the parse-forest grammar should be returned, instead; this may be helpful in cases of ambiguity, since it allows the caller to study the ambiguity and in some cases to extract the preferred parse tree.

In some cases the caller will have the grammar in the non-XML form described in the ixml specification; in others, the grammar will be available as an XML document; sometimes the caller will have a URI for the grammar. The input may similarly be available either as a string or as a URI. Aparecium provides distinct calls for each of these situations, to simplify the use of Aparecium in constructing applications.

The talk will briefly describe the current status of Aparecium implementation and (the gods willing) show a simple demo; it will conclude with a discussion of some next steps in the work on Aparecium and in the development of broader support for invisible XML.

C. M. Sperberg-McQueen is the founder of Black Mesa Technologies LLC, a consultancy specializing in the use of descriptive markup to help memory institutions preserve cultural heritage information. He co-edited the XML 1.0 specification, the Guidelines of the Text Encoding Initiative, and the XML Schema Definition Language (XSDL) 1.1 specification.

Declarative is a Feminist Issue

Betsy HaibelDirector of Software Engineering, LTSE

Front-end web development is rooted in two declarative languages (HTML and CSS) and one imperative language (JavaScript) that can be written in a functional style. Front-end web development is also noted for contentious and ever-shifting gender dynamics – one year HTML and CSS are “for girls” and “not real programming,” another year it's JavaScript that's looked down upon. In this talk, we'll look at the history of front-end development through the twin lenses of gender and declarativity. Along the way, we'll see how gendered programming trends boosted the adoption of popular frameworks – and led to the quiet death of others. We'll get real about the social forces that have affected the credibility and “approachability” of declarative methods in front-end, and talk about how these same forces might play out in other declarative projects.

Betsy Haibel is a San Francisco-based engineering leader with over a decade of experience. She writes fiction and non-fiction in English and a variety of programming languages, and prior to the pandemic co-organized the Learn Ruby in DC meetup.