August 13, 2009

Balisage 2009 - Pull, Push, Stream and Scream

On August 12, Mike Kay (Saxonica) presented two back-to-back topics related to XML/XSLT pipeline processing optimization. The first talk, You pull, I’ll push: On the polarity of pipelines, [Submitted Paper] compared and contrasted the control flow in the pipeline, which can run either with the data flow ("push") or against it ("pull"). That is, in “push”, control flow and data flow in the same direction, whereas in “pull”, control flow and data flow in opposite directions.In the main loop, data is pulled on input and then pushed. Kay discussed other combinations, such as fully streamable case of pull, pull, control, push, push pipelines. In branch and merge pipelines, pull is needed for multiple inputs, whereas push is needed for multiple outputs. Schema validation in Saxon is written in push style because it forks. This led Kay to say there is no clear winner between push and pull; each is appropriate in different situations.

Mike Kay’s paper discusses various combinations and approaches such as the other “JSP” (Jackson Structured Programming), the concept of inversion, and coroutines, which involve multiple stacks in a single thread; 2 programs are written as if they each own the control loop. Kay relates these concepts to XSLT processors and concludes:
As the usage of XML increases and more and more users find themselves applying languages like XSLT and XQuery to multi-gigabyte datasets, a technology that can remove the problems caused by pipeline polarity clashes has great potential.

[Will add his second talk here when I'm not so sleepy.]

No comments: