Showing posts with label Conference. Show all posts
Showing posts with label Conference. Show all posts

August 10, 2009

Balisage 2009 - Review and Summary: Processing XML Efficiently


Mike Kay summarized and reviewed the symposium and added his own observations. He was a little surprised that there was nothing about Binary XML and little about speeding up XSLT or XQuery. Kay made the following main points:

Performance against other objective:

  • Saxon: first standard conformance, then usability, then performance
  • performance is not always the most important consideration
  • David Wheeler (invented the subroutine)- “Optimize the code that users actually write”.
  • Should you optimize for Joe User or the expert?

Performance metrics:

  • Response time / latency
  • throughout
  • resource cost
  • scaleability
  • Jim Robinson: “Good enough for us” -- your own requirements, not the world’s best solution

Performance methodology:

  • measure
  • understand - figure out where bottlenecks are and what can be done to improve it
  • focus on critical components
  • improve -but if not, don’t leave in the things that didn’t help
  • repeat until ok - what counts as ok

Where are the bottlenecks?

  • application-level glue
  • parsing vs. query?
  • validation?
  • conversion to/from XML?
  • serialization?
  • query/transformation?
  • too many phases in the pipeline?


What improvements can we expect?


  • faster parsing has ben promised?
  • faster and more scalable transformation
  • optimized pipelines?
  • smarter users?
  • faster hardware

Balisage 2009 - Efficient Scripting

David Lee, Epocrates (with a writing assist from Norman Walsh, Mark Logic) discussed Efficient Scripting (Submitted Paper). Scripting is useful for splitting complex workflows into manageable tasks that are easier to debug and easier to develop than with programming languages; the piece part solutions can be glued together later. Lee devised 5 kinds of test cases (2 were baselines) for 4 scripting languages: DOS and bash (Cygwin implementation), xmlsh and xproc (calabash is Norm Walsh’s early XProc implementation). The 2 XML scripting languages performed significantly better than the other two. With approximately 600 XSLT and XQuery operations, xmlsh and xproc yielded 50-75 fold improvement. With 3,000 operations, there was a 150-fold improvement. I believe Lee said there was a 150-200-fold improvement when there were approximately 30,000 operations. It should be noted that the chosen test cases focus on areas where scripting performs worst.

Balisage 2009 - Symposium on Processing XML Efficiently


Michael Kay kicked off the all-day International Symposium on Processing XML Efficiently by explaining that the morning would focus mainly on bottom-up hardware-based solutions up to pipeline processing, whereas the afternoon would be more top-down from the application programmer’s viewpoint.

The first presentation was by Michael Leventhal & Eric Lemoine, LSI Corporation, and covered an XML chip in development through this decade. The tokenizer developed by Tarari was originally based on the DOM which didn’t yield vast improvements. By 2004, a new approach called RAX (Random Access XML, simultaneous XPath) proved highly acceleratable, with 40x faster performance than conventional DOM-based. By 2005-6, this work became XMachine, consisting of a Rava stack, XSLT processor, and XTM (XML Threat Management) for streaming large documents. LSI acquired Tarari in 2007. By 2008, the XML chip was selected for HP SAP accelerator, yielding 3x to 4x improvement for enterprise applications. XMachine 2.0 is the 2009 version. The key to this hardware solution is providing a software stack above the hardware, the goal being to accelerate the overall software applications, so XMachine offers of an optimized API. The main design decision for hardware acceleration, they contend, is throughput vs. latency. The field-programmable gate array (FPGA) was a game-changer for their work. LSI has obtained performance numbers of 4.9 Gbps for complex XML processing with documents as large as 4 GB. They believe their new design will accommodate 1 TB documents. According to their Submitted Paper:
Results show that the use of an XML co-processor can reduce CPU cycles per byte of XML processed by amounts ranging from a factor of 3 to a factor of 50 depending on the workload, while power consumption can be reduced by a factor of 7.


One of LSI’s competitor's, DataPower, presented next. David Maze discussed the IBM DataPower XML XG4 processor card, the goal of which is to accelerate throughput and providing parallel processing with a card that can be switched to different processors. Their main use case is SOAP traffic in, access control checking, threat checking, and ensuring output is safe for the consumer of the messages. IBM uses a post-processing engine (PPE) that performs XPath evaluation, schema validation, XML routing, and filtering in hardware (i.e., to reject dangerous XML). DataPower has achieved throughput (and schema validation) up to 2 Gbps with up to 64K simultaneous documents and 128K simultaneous XPath expressions with as little as 15W of power. The XG4 has three kinds of output: DOM-like tree for the SOAP header, a SAX-like stream for the SOAP body called TLA (sorry, it’s just another Three Letter Acronym to me ;-), and post-processed data (pass, fail, status). Their PPE approach enables application-specific optimizations and extensions for new standards. See their Submitted Paper (abstract only as of this blog).

August 09, 2009

Balisage 2009 Blog - Introduction


Welcome to the beginning of my blogs for Balisage Conference 2009 in beautiful centre-ville Montreal. Check back daily for highlights of this technical XML conference. For each formal presentation referenced in this blog, the keywords Submitted Paper will link to the relevant publication in the conference proceedings. According to the proceedings:
Balisage is a peer-reviewed conference designed to meet the needs of markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; presentation and accessibility; making systems that make markup dance (or dance faster to a different tune in a smaller space) — in short, changing the world and the web through the power of markup.

See also the Extreme Markup Community site.

Photo: Hotel Europa, Second Floor Entrance to Conference (credit: Ken Sall, c'est moi)

July 23, 2009

Good to go!

Well, I've booked my flight to Montreal for Balisage, but it was more costly than I'd hoped for. I guess dem's da breaks when you book less than 2 weeks in advance. Still, I'm really excited about visiting Montreal and attending this highly intellectual XML conference with so many thought-provoking topics. I'll be highlighting topics that interest me over the next 10 days.

July 18, 2009

Balisage 2009, here I come!

I managed to get the conference rate for the hotel 17 days late. Sure hope they honor my confirmation number. Registered for the Balisage 2009 conference, arriving Sat., Aug. 8 and leaving Fri. the 14th. Next, to book the flight. But I am a little worried about whether that passport renewal will get here in time. I sent it in prior to deciding about the conference and didn't think there was an immediate need for it, but now there is.

July 16, 2009

How do you say "Markup" in French?

Answer: Balisage (root: balise - beacon, buoy, sign). I'm strongly considering going to the 2009 Balisage: The Markup Conference in Montreal in August 11-14. For many years, this was known as "Extreme Markup", an XML geekfest (and I mean that in the nicest way) held yearly in Canada. This is an event the draws the greatest minds of the XML world to wax theoretical and practical. The lineup for 2009 is quite impressive. And how can you not want to visit historic, scenic Montreal? (List time I was there I was 10 years old and I have the pictures to prove it.) Now if only my passport renewal arrives in time....