Element classification: a bottom up perspective

Diederik Gerth van Wijk

XML is top down oriented. In a DTD, an element type declaration specifies what content the element may contain, but in no way you can constrain in what context that element may be included. In document oriented languages, where mixed content is typical, it is worth looking up. By seeing in what way an element may occur in its potential parent elements one can classify elements. It raises questions like: is it "suspicious" if an element may occur in the mixed content of some elements, and in the element content of others? Is it suspicious if an element that may occur in mixed content has itself element content? What kind of elements occur in a repeatable OR group? What elements can serve as word boundaries? What does a SEQ group "mean"? Two sequence paradoxes will be mentioned, and by taking as an example both the content and the parents of the "paragraph next door" element we will find that what should be the second most used element in any document instance is actually missing in most industry standard markup languages.

Presentation, 8 November 2022

Diederik Gerth van Wijk celebrates this year that 45 years ago he wrote his first computer program, studying economics at the Erasmus University Rotterdam. As an assistent to the department of computer science he had to write user manuals, which prompted him to invent his own markup system. His master thesis was on using “Word Lists for Intentional Text Processing”, and after his graduation he started working as a programmer with one of the electronic publishing houses of Wolters Kluwer’s. For 15 years he was responsible for maintaining and developing the DTDs of the law publishing division, during which he joined the Dutch normalisation committee for SGML; he also became editor of <!ELEMENT, the Dutch User Group’s journal. In 2007 he left Wolters Kluwer and became an independent consultant and programmer. The last years he is reflecting his sins and wondering if it isn’t time to create a theoretical foundation for the art of content modelling.