Banking with ixml and XForms
Abstract
My bank offers me my bank records as CSV, which is easy enough to convert to XML for input to my XForms banking application. But they don't offer my credit card records digitially, only as a screen display. To solve this I use ixml to convert that text to XML, as input to the application. A real-world application of ixml.
A History of ixml
The year 2024 has turned out to be a year of ixml. Looking at the graph of conference papers referencing the language, we see an explosion this year:
The number of ixml talks over the years
That first mention in 2004 wasn't actually a talk about ixml, but a keynote on the design of notations [notations] where I said, in passing: "Parsing is quite easy. It would be fairly easy to add a generalised part to the XML pipeline that parsed unmarked-up text, and produced XML as a parse tree: it's just a different sort of transform. We could have our cake and eat it!"
The first real paper was in 2013 [ixml1], followed by iterations based on user experience in 2016 [ixml2], and 2017 [ixml3]. In 2020 a working group was formed [wg], which led in 2022 to the formal specification being published [spec]. At the time of writing there are now 5 or 6 serious implementations, with others in development [impl].
The purpose of ixml
Most textual data has an implicit structure. For instance dates, like
8/11/2024
, URLs such as
http://cwi.nl/~steven/Talks/2024/11-08-banking/
and bibliographic
references, such as Steven Pemberton, Banking with ixml and XForms, Proc.
Declarative Amsterdam 2024
, all have a structure obvious to the human
reader, but opaque to programs processing the data.
Using ixml turns data with implicit structure such as these into data with explicit structure, such as
<date>
<day>8</day>
<month>11</month>
<year>2024</year>
</date>
To achieve this, you describe the format, such as this simple example for dates:
date: day, -"/", month, -"/", year. day: digit, digit?. month: digit, digit?. year: digit, digit, digit, digit. -digit: ["0"-"9"].
which is then then fed, together with the data it is describing, through the ixml processor to give you a structured version of the data. Diagramatically, it looks like this.
The ixml process
Since both description and document are in textual format, they both get processed separately by the ixml processor to produce the equivalent structured versions.
Aims
One of the principle aims of invisible markup in general is to draw attention to the abstract document that underlies any data representation. Once you have that abstract document after parsing the textual representation, there are various purposes you can put it to, that may not even involve transcription to XML only to be reparsed by an XML processor. For instance in the diagram above, the description document may be converted directly to data structures suitable for the parser, rather than converted to XML.
However, ixml does specifically open up the XML pipeline to more than just XML, and one of the perceived targets was XForms.
XForms is a declarative, XML-based, programming language [XForms]. It is a declarative rather than a procedural language which has proven to make life much easier for the programmer. It is like spreadsheets but generalised. Experience over the years has demonstrated huge productivity gains for large projects, typically 10 fold, but sometimes better, such as 10 people in 1 year instead of 30 people in 5 years, with the best case to date of 1 person in 3 years against 70 people in 10 years [CityHCR]. However XForms expects its data in XML, and many potential sources of data aren't in XML: ixml is a way of making those data sources accessible to XForms.
However, there is another side to XML and XForms: XForms programs are written directly in XML. But some people prefer to write programs and other documents in textual form rather than marked-up form. This is why we have such formats as Markdown, to make document entry easier for the user. It can be faster, easier, and more human-oriented to write programs in text, and let the computer convert it to XML. It is future work to define a textual version for XForms programs.
How ixml is being used
It is educational to see how ixml is being put to use. Just drawing from reports in emails or conference papers, we see:
-
Generating structured bibliographies from OCR'd text
-
Extracting references to other laws in the text of laws [laws]
-
Generating art [art]
-
Generating crochet patterns [crochet]
-
Analysing late era Roman Trials [trials]
-
Extracting data from VINs - vehicle identification numbers [vin]
-
Aerospace supply communication messages [aero]
-
Programming Nuclear Magnetic Resonance machines [nmr]
-
Syntax highlighting code examples in a book [code]
-
Banking [accts]
Banking
My bank has stopped sending paper statements, and offers online statements instead, either in PDF for printing yourself, or CSV for digital storage. While the bank does offer some online search facilities, it is slow, and inconvenient. I needed something similar to how Quicken used to work, and so I wrote an application in XForms.
To write the app in XForms I first had to get the data into a usable form.
Since the bank only supplied PDF or CSV, CSV had to be the source of the data:
"Date","Name / Description","Account","Counterparty","Code","Debit/credit","Amount (EUR)","Transaction type","Notifications" "20161230","The Movies Art House AMSTERDAM","NL80INGB1234567890","","BA","Debit","11,00","Payment terminal","Card sequence no.: 009 29/12/2016" "20161229","DIJKMAN B.V. MUZIEK AMSTERDAM","NL80INGB1234567890","","BA","Debit","99,00","Payment terminal","Card sequence no.: 009 28/12/2016" "20161227","CCV*FOODHALLEN AMSTERD AMSTERDAM","NL80INGB1234567890","","BA","Debit","6,25","Payment terminal","Card sequence no.: 009 24/12/2016"
In the first instance I used the streaming editor sed
, which
resulted in largely opaque code:
head -n 2 $1 | tail -1 | sed 's/"\(....\)[^,]*,[^,]*,"NL..INGB\([^"]*\)".*/<bank year="\1" acct="\2">/' tail --lines=+2 $1 | sed ' s/^/<entry>/ s/&/&/g s/"\(....\)\(..\)\(..\)",/<date>\1-\2-\3<\/date>/ s/"\([^"]*\)",/<type>other<\/type><name>\1<\/name>/ s/"\([^"]*\)",/<from>\1<\/from>/ s/"\([^"]*\)",/<to>\1<\/to>/ s/"\([^"]*\)",/<code>\1<\/code>/ s/"Debit","\([^,]*\),\([^"]*\)",/<amount>-\1.\2<\/amount>/ s/"Credit","\([^,]*\),\([^"]*\)",/<amount>\1.\2<\/amount>/ s/"\([^"]*\)",/<sort>\1<\/sort>/ s/"\([^"]*\)"/<description>\1<\/description>/ s/$/<\/entry>/ ' echo '</bank>'
Once ixml was available, the code was at least more descriptive and tractable:
bank: labels, entry*. -labels: -'"Date","Name / Description","Account","Counterparty",', -'"Code","Debit/credit","Amount (EUR)","Transaction type",', -'"Notifications"', nl. entry: date, type, name, from, to, code, amount, sort, description, nl. type: +"other". date: -'"', y, +"-", m, +"-", d, -'",'. -y: digit, digit, digit, digit. -m: digit, digit. -d: digit, digit. name: field, -",". from: field, -",". to: field, -",". code: field, -",". sort: field, -",". description: field. -field: -'"', c*, -'"'. -c: ~['"'; #a; #d]. amount: neg; pos. -neg: -'"Debit",', +"-", number, -",". -pos: -'"Credit",', number, -",". -number: -'"', euros, -",", +".", cents, -'"'. -euros: digit+. -cents: digit, digit. -digit: ["0"-"9"]. -nl: (-#a; -#d)+.
Let's consider this code in detail. A bank statement consists of a line of the labels that are completely ignored, followed by any number of entries:
bank: labels, entry*. -labels: -'"Date","Name / Description","Account","Counterparty",', -'"Code","Debit/credit","Amount (EUR)","Transaction type",', -'"Notifications"', nl.
By including these labels literally (rather than just skipping the line), if
the bank ever changes the format, an immediate error will be given.
nl
is just a newline.
Each entry, which takes up a single line, has a number of fields:
entry: date, type, name, from, to, code, amount, sort, description, nl.
Of these, one extra field has been added, type
, that the
application will use. In the input it is empty, and will appear in the output
as a preset field:
type: +"other".
which will always appear as
<type>other</type>
which can later be changed in the application.
A date is a field with a string of numbers in the input, like
"20161230"
so hyphens are added at the right places:
date: -'"', y, +"-", m, +"-", d, -'",'. -y: digit, digit, digit, digit. -m: digit, digit. -d: digit, digit.
which would give something like
<date>2016-12-30</date>
This matches the data type for a date in XML, and so does not need to be structured any more than that.
A number of the fields just contain unstructured character data:
name: field, -",". from: field, -",". to: field, -",". code: field, -",". sort: field, -",". description: field. -field: -'"', c*, -'"'. -c: ~['"'; #a; #d].
A field is just zero or more characters surrounded by quotes, where a character is anything except a quote or an end-of-line character.
<name>The Movies Art House AMSTERDAM</name>
Finally, amounts are supplied in an odd way, using two fields, since banks don't believe in negative numbers, just positive amounts of debit or credit:
"Credit","11,00" "Debit","6,25"
Note the European style of number representation, which gets treated suitably, by adding a minus sign before the debits, and replacing the commas with points:
amount: neg; pos. -neg: -'"Debit",', +"-", number, -",". -pos: -'"Credit",', number, -",". -number: -'"', euros, -",", +".", cents, -'"'. -euros: digit+. -cents: digit, digit. -digit: ["0"-"9"].
Which gives results like
<amount>11.00</amount>
or
<amount>-6.25</amount>
Processing the data with this description gives XML like this:
<bank>
<entry>
<date>2016-12-30</date>
<type>other</type>
<name>The Movies Art House AMSTERDAM</name>
<from>NL80INGB1234567890</from>
<to/>
<code>BA</code>
<amount>-11.00</amount>
<sort>Payment terminal</sort>
<description>Card sequence no.: 009 29/12/2016</description>
</entry>
<entry>
<date>2016-12-29</date>
<type>other</type>
<name>DIJKMAN B.V. MUZIEK AMSTERDAM</name>
<from>NL80INGB1234567890</from>
<to/>
<code>BA</code>
<amount>-99.00</amount>
<sort>Payment terminal</sort>
<description>Card sequence no.: 009 28/12/2016</description>
</entry>
<entry>
<date>2016-12-27</date>
<type>other</type>
<name>CCV*FOODHALLEN AMSTERD AMSTERDAM</name>
<from>NL80INGB1234567890</from>
<to/>
<code>BA</code>
<amount>-6.25</amount>
<sort>Payment terminal</sort>
<description>Card sequence no.: 009 24/12/2016</description>
</entry>
</bank>
This can be then used for the XForms banking application.
The XForms Banking App
Credit card account
While my bank will gladly give me downloadable statements for my current account, for some reason it doesn't for my credit card, even though it doesn't send paper statements for that either.
All it does is display my transactions on the screen.
The bank's display for a credit card account
To deal with this, what I do is select all this text from the screen, and copy it as text to a file. Then I can use ixml to transform it to structured data (note that the blank lines are not separators between entries).
Transactions current period 2024 Today EUR Kobo Software Ireland D02T380 IE −5.49 22 October EUR THEATRE ROYAL HAYMARKE LONDON GBR −214.62 10 October EUR SP THEPHONESHOPBE JETTE BEL −1,199.00 Period of 5 Sept 2024 to 4 Oct 2024 Opening balance for this period: 0.00 Transaction total: −289.33 Monthly repayment: 289.33 Closing balance for this period: 0.00 2024 4 October EUR AFLOSSING 289.33 3 October EUR OTT* NT AT HOME LONDON GBR −11.39 2 October EUR Google Payment IE LTD Dublin IRL −14.99 24 September EUR ...
The ixml: top level
The ixml for this has at the top level the current period, followed by a
number of earlier periods, all of which contain a number of
day
s.
cc: current, period*. current: -"Transactions current period", -#a, day*. period: -"Period of ", from, -" to ", to, -#a, opening, total, repayment, closing, day*. from: -date. to: -date.
This is mostly uninteresting stuff, since the opening and closing balances are always zero, the repayment and total are always the same, and the repayment amount occurs in the transactions anyway.
opening: -"Opening balance for this period:", -#a, -amount, -#a. closing: -"Closing balance for this period:", -#a, -amount, -#a. repayment: -"Monthly repayment:", -#a, -amount, -#a. total: -"Transaction total:", -#a, -amount, -#a.
The interesting detail is in the days. Each day has the date and a number of transactions (the currency is always EUR, which gets deleted):
day: date, -#a, (-"EUR", -#a)?, transaction+. transaction: -#a, payee, -#a, amount, -#a. payee: ~[#a]*.
About the only interesting thing about the transactions is that they do acknowledge that negative numbers exist, and (correctly) use the Unicode character at point #2212 as minus sign, which the ixml replaces with hyphen, which is the minus sign in XML.
They use commas to separate thousands, which get deleted, and point for the decimal separator:
amount: (-#2212, +"-")?, (digit; -",")+, ".", digit, digit.
Dates are a bit of a mess: sometimes the year is before, sometimes after, sometimes it's not there at all. Sometimes it's the year and the word "Today":
date: y, -#a, (d, -" ", m; "Today"); d, -" ", m, (-" ", y)?. d: digit, digit?.
Month names are either written in full, or as 3 letters, with the exception of Sept...
m: "January"; "February"; "March"; "April"; "May"; "June"; "July"; "August"; "September"; "October"; "November"; "December"; "Jan"; "Feb"; "Mar"; "Apr"; "May"; "Jun"; "Jul"; "Aug"; "Sept"; "Oct"; "Nov"; "Dec". y: digit, digit, digit, digit. -digit: ["0"-"9"].
Processing the scraped text with this description gives XML like this:
<cc>
<current>
<day>
<date>
<y>2024</y>Today</date>
<transaction>
<payee>Kobo Software Ireland D02T380 IE</payee>
<amount>-5.49</amount>
</transaction>
</day>
<day>
<date>
<d>22</d>
<m>October</m>
</date>
<transaction>
<payee>THEATRE ROYAL HAYMARKE LONDON GBR</payee>
<amount>-214.62</amount>
</transaction>
</day>
</current>
<period>
<from>
<d>5</d>
<m>Sept</m>
<y>2024</y>
</from>
<to>
<d>4</d>
<m>Oct</m>
<y>2024</y>
</to>
<opening>0.00</opening>
<total>-289.33</total>
<repayment>289.33</repayment>
<closing>0.00</closing>
<day>
<date>
<y>2024</y>
<d>4</d>
<m>October</m>
</date>
<transaction>
<payee>AFLOSSING</payee>
<amount>289.33</amount>
</transaction>
</day>
<day>
<date>
<d>3</d>
<m>October</m>
</date>
<transaction>
<payee>OTT* NT AT HOME LONDON GBR</payee>
<amount>-11.39</amount>
</transaction>
</day>
</period>
</cc>
XForms Code
Once we have the data, the code to display it is simple, and very much
reflects the structure of the data as described by the ixml: we repeat over the
top level elements (either current
or period
),
provide a heading, then repeat over the days, and in the days, repeat over the
transactions:
<repeat ref="*">
<label class="period">
<output value="if(from, concat(from/y, '-', from/m, '-', from/d, ' to ',
to/y, '-', to/m, '-', to/d),
'Current')"/>
</label>
<repeat ref="day">
<output ref="d"/> <output ref="m"/>
<repeat ref="transaction">
<output class="amount" ref="amount"/> <output class="payee" ref="payee"/>
</repeat>
</repeat>
</repeat>
Adding a simple search facility:
<input ref="instance('search')/q" incremental="true">
<label>Search</label>
</input>
<trigger>
<label>×</label>
<setvalue ref="instance('search')/q" ev:event="DOMActivate"/>
</trigger>
and modify the top-level repeat:
<repeat ref="*[. = instance('search')/q]">
and we already have a very useful application:
The XForms Credit Card App
Conclusion
Invisible markup is about creating an abstract structured document from textual data where the structure is implicit. Invisible XML makes an XML serialisation of that abstract document, that can then be used as a source of data for the XML pipeline. It should be noted that ixml is being added to XPath and XQuery as a function [xpath], which means that ixml will shortly automatically be available to XForms (and any other XML-based technology that uses XPath or XQuery).