August 13, 2009

Balisage 2009 - Stream and Scream

This is the promised Part 2 of the blog entry about Mike Kay’s XML/XSLT processing optimization talks from August 12th. His second talk, entitled XSLT Screaming in XSLT 2.1 - Saxon-EE 9.2, was actually an impromptu. Kay gave us an unofficial preview of XSLT 2.1, which isn’t yet a public working draft from the W3C. Despite only the change in minor version number, we learned that the changes in 2.1 will be substantial.

Mike defined XSLT streaming as processing source documents without building a tree in memory, making it possible to handle much larger documents and reducing latency. Apparently, implementors haven’t taken advantage of streaming yet. The new XSLT specification will define a subset of the language that is streamable (presumably like the XProc spec does). Boldface is used to highlight new XSLT instructions or attributes below.
  • xsl:stream href=”uri”
  • xsl:mode streamable=”yes” name=”stream1”
  • xsl:template match=... mode=”stream1”
Exactly what is streamable within a template will be defined. For example, no sorting, no sideways navigation, only one download selection, no ancestor::x.child::y, etc.

Other new XSLT instructions include:
  • xsl:iterate - syntax like xsl:for-each, but with semantics of tail recursion. For example, if you are producing a document with all bank transactions, it could be generated with a running total of balance. You can pass parameters to next iteration using xsl:next-iteration and xsl:with-param.
  • xsl:merge - merge multiple streamed input files; also
  • xsl:merge-source, xsl:merge-input, xsl:merge-key, xsl:merge-action
  • xsl:copy-of and xsl:snapshot - retains ancestors and attributes
Specifically with regard to SAXON-EE 9.2, Kay highlighted the following functions and instructions:
  • saxon:stream() function -- xsl:stream, mainly for documents larger than physical memory; lazy evaluation
  • saxon:iterate -- helpful as an alternative to recursion that some programmers can understand more easily
  • saxon:mode streamable=”yes”, but presently with only a subset of the XSLT 2.1 use cases implemented
If anyone caught instructions or details I missed, feel free to add comments below.

No comments: