Development of language solutions based on TEI and ODD

Eduard DrenthFryske Akademy

The Text Encoding Initiative (TEI) is a vast, long standing and widely used encoding standard, covering different areas in the humanities. High quality documentation with many examples and active discussions on the rationale behind the available elements and attributes and their intended use are among the many qualities of the TEI. The TEI presents itself as guidelines, trying to cover as many areas and use-cases in the humanities as possible. The TEI is also designed to be customized for use in specific situations. Customization is achieved via a "One Document Does it all" (ODD). An ODD offers a mechanism to override, restrict, eliminate and extend (parts of) the guidelines in a documented way. ODD can be seen as a powerful abstraction layer from which validation, documentation, but also processing models can be generated. A nice, but complex feature of ODD is that they can be chained, enabling you to have focused ODD's and to promote reuse. In my work on corpora and dictionaries at the Fryske Akademy ODD is the basis from which I generate XSD, configuration, SQL, Java, bind.xml etc. In my presentation I will show you how we benefit from ODD in for example editing and publishing solutions, our goal being to enhance tool development and interoperability through standardization.

Outline of the presentation:

1. ODD - explanation and background - chaining - generation
2. Usages - corpora - dictionaries - lexicons - interoperability
3. editing - oXygen: validation and customizing author mode
4. The future - processing model and TEI Publisher

Presentation, 9 October 2020

Eduard Drenth lives in the north of the Netherlands together with his wife and youngest son. Eduard Drenth has been in ICT for over 20 years. His main expertise is in Java, EE, databases and XML-technologies. Over the last four years he has mainly been working for language researchers and lexicographers on corpus linguistics, lexicons, dictionaries and digital editions at the Fryske Akademy. TEI and universal dependencies are the most important standards in the data layer. The Akademy can be seen as maintainer of Frisian language data and provider of (web)services. Both researchers and the general public are served using the same data and services. Since the Akademy is small it is important to limit the number of technologies, to have well documented, reusable libraries based on stable build processes. That's where ODD comes in...