August 10, 2009

Balisage 2009 - Symposium on Processing XML Efficiently


Michael Kay kicked off the all-day International Symposium on Processing XML Efficiently by explaining that the morning would focus mainly on bottom-up hardware-based solutions up to pipeline processing, whereas the afternoon would be more top-down from the application programmer’s viewpoint.

The first presentation was by Michael Leventhal & Eric Lemoine, LSI Corporation, and covered an XML chip in development through this decade. The tokenizer developed by Tarari was originally based on the DOM which didn’t yield vast improvements. By 2004, a new approach called RAX (Random Access XML, simultaneous XPath) proved highly acceleratable, with 40x faster performance than conventional DOM-based. By 2005-6, this work became XMachine, consisting of a Rava stack, XSLT processor, and XTM (XML Threat Management) for streaming large documents. LSI acquired Tarari in 2007. By 2008, the XML chip was selected for HP SAP accelerator, yielding 3x to 4x improvement for enterprise applications. XMachine 2.0 is the 2009 version. The key to this hardware solution is providing a software stack above the hardware, the goal being to accelerate the overall software applications, so XMachine offers of an optimized API. The main design decision for hardware acceleration, they contend, is throughput vs. latency. The field-programmable gate array (FPGA) was a game-changer for their work. LSI has obtained performance numbers of 4.9 Gbps for complex XML processing with documents as large as 4 GB. They believe their new design will accommodate 1 TB documents. According to their Submitted Paper:
Results show that the use of an XML co-processor can reduce CPU cycles per byte of XML processed by amounts ranging from a factor of 3 to a factor of 50 depending on the workload, while power consumption can be reduced by a factor of 7.


One of LSI’s competitor's, DataPower, presented next. David Maze discussed the IBM DataPower XML XG4 processor card, the goal of which is to accelerate throughput and providing parallel processing with a card that can be switched to different processors. Their main use case is SOAP traffic in, access control checking, threat checking, and ensuring output is safe for the consumer of the messages. IBM uses a post-processing engine (PPE) that performs XPath evaluation, schema validation, XML routing, and filtering in hardware (i.e., to reject dangerous XML). DataPower has achieved throughput (and schema validation) up to 2 Gbps with up to 64K simultaneous documents and 128K simultaneous XPath expressions with as little as 15W of power. The XG4 has three kinds of output: DOM-like tree for the SOAP header, a SAX-like stream for the SOAP body called TLA (sorry, it’s just another Three Letter Acronym to me ;-), and post-processed data (pass, fail, status). Their PPE approach enables application-specific optimizations and extensions for new standards. See their Submitted Paper (abstract only as of this blog).

No comments: