August 19, 2009

Balisage 2009 - Best Practices - or Not?


{If it could talk, this blog entry would be begging for your comments via the "comments" link below the article.}

On August 13th, the Balisage Conference 2009 hosted a spirited panel discussion featuring David Chesnutt, Chet Ensign, Betty Harvey, Electronic Commerce Connection, Laura Kelly, National Library of Medicine, and Mary McRae, OASIS. Harvey shared her "Top 10 Mistakes in DTDs" from 1998, most of which are still applicable today. We learned a bit about the standardization process at OASIS from Mary McRae, who differentiated between the codified rules which are enforced, informative guidelines which are essentially only recommendations, and very informal oral exchange of practices (which she emphasized should be recorded in written form).

McRae mentioned the OASIS Technical Committee Process which, among many other things, defines the OASIS Naming Guidelines in two parts: Part 1: Filenames, URIs, Namespaces and Part 2: Metadata and Versioning. For example, OASIS requires a RDDL-like file at the URI given for a namespace. This is in accord the W3C’s Architecture of the World Wide Web “good practice” for namespace documents:
The owner of an XML namespace name SHOULD make available material intended for people to read and material optimized for software agents in order to meet the needs of those who will use the namespace vocabulary.
McRae mentioned Adoption Technical Committees, I believe in reference to DITA Help, DITA Localization, UDDI, SAML, and OpenDocument Format (ODF). She also indicated that all OASIS specifications are either published in DocBook, Word, OpenOffice, DITA or XHTML.

An issue that generated divergent opinions was the question of which version of an XML Schema is the normative one -- the version stored in a separate file that can be validated and directly used, or the version that has been pasted into a Word document or other word processing format? Of course, the two should be identical and ideally pulled from the same source (i.e., the schema file could be imported into the document), but this isn’t always the case. The separate has the advantage of lending itself to code review, whereas the version embedded in a specification that is normative gives the impression that it too is normative.

Does imposing best practices on developers restrict creativity and productive competition? Are Naming and Design Rules (NDRs) inherently evil? Again, this question solicited differing opinions, although it seems that the majority who offered comments were against such impositions.

Laura Kelly's experience at the US National Library of Medicine taught her that the scope of any project is smaller than we like to admit, so perhaps draconian rules are counter-productive. When it comes to XML encoding, focus on giving the users what they should be asking for -- how to markup data so it will work best in their systems. Developers really only want to know "have I tagged my data correctly?" and "what does the data look like?"

David R. Chesnutt offered some lessons learned from his SGML work with the Model Editions Partnership (MEP) circa 1997. This project used a "subset of the SGML markup system developed by the Text Encoding Initiative (TEI)".

{Post comments below.}

August 16, 2009

Airport Blues - You Can't Miss It

After Balisage 2009, I took a taxi to the Montreal airport because I didn't leave in time for the shuttle. (True Confession: I didn't want to schlep 3 bags to another hotel to catch the shuttle.) Arrived 2.5 hours early, so I sat in the ticketing area and had a leisurely "bag" lunch courtesy of the conference caterers (if you can call Mediterranean pasta, Caesar salad and cheesecake cup a "bag" lunch -- thanks, Linda and Chris!). When I went to check in, I found out my plane was delayed an hour and would not meet up with my Philadelphia connection, so I was looking at arriving 2 hours later -- at 10 pm and it was only 2 pm! Not at all what I wanted, so they booked me on a flight to Toronto which was leaving in less than an hour. The catch was I'd have to reclaim my bags in Toronto and go through US Customs in the 1.25 hours between the 2 flights. Although risky (never been to the Toronto airport), it would get me home around 7 pm which sounded great.

Security check in Montreal was going really slowly. Didn't think I'd make the 3 pm flight. They wanted all my neatly packed electronic in see-thru pouches removed. What a pain! Trip to Toronto wasn't bad with 3-seats per aisle. Was able to read "Duel" by Richard Matheson, a gripping short story about road rage (and more). And then the fun began.

Here I am in (one of) Toronto's airports looking for my checked baggage. Getting off the plane, they told me "you can't miss it". Well, I beg to differ! Perhaps that's true when you pass it every day but not when you walk into a huge open area with signs everywhere and people hustling in every direction! I went completely through the area and started down another hallway and then stopped to ask a porter who didn't speak much English (thankfully, enough though). So I re-traced my steps and found the small sign that said "Baggage for US Customs" or something like that. Fortunately found my bag without much difficulty. Then I came to the long line for customs 20 seconds too late to miss a gaggle of at least 30 Japanese tourists who all proceeded to crowd in front of me. (There was a second security check somewhere in Toronto.)

Then I'm told my gate for the connection is on the other side of the airport and I need to catch a shuttle; this is just 15 minutes before boarding time. I'm given directions to get to the shuttle (down these stairs, turn left, go down another flight of stairs, then outside -- you can't miss it ;-). I manage to find it just a minute before it was ready to leave. The large shuttle is moving at what seems like a fast clip, making wide swings that almost send me flying with my laptop and medical device. It dawns on me that I can't recall the exact gate number (which wasn't written on my ticket) and I'm not sure if the shuttle makes multiple stops but there was no way to ask the driver who was behind glass. When we arrive at the stop, everyone got off so I figured it was one stop fits all, so I got off. I found a departure/arrival screen and found my flight. I have just enough time to buy bottled water since by now I am dehydrated from rushing around. I arrive at my gate in time to find out my seat wasn't exactly assigned so I had to take a window seat next to a woman who was fatter than me and seemed annoyed that she had to get up so I could scrunch myself into the small prop plane space. She refused to let me share the arm rest and had this blanket (!) on her lap which kept flapping onto my leg. Very uncomfortable!

I had planned to read the original version of "Nightmare at 20,000 Feet" also by Matheson (later re-worked for the original Twilight Zone with a young and handsome William Shatner and then the TZ movie). Thought it would be cool to read it on a plane. Turns out my window seat was right near the propeller just like the main character in the the story. As we prepared for take off, the stewardess announced we'd be cruising at 21,000 feet. How cool! (Un)fortunately it was not dark and there was no gremlin or banshee on the wing of the plane. I checked. More than once.

So we arrived safely in Baltimore but I was one hour ahead of my airport shuttle reservation and my cell phone (useless in Canada) was completely dead. I was also far from where the shuttle would be. A helpful information desk lady pointed me in the right direction and gave me the airport shuttle phone number (which of course had been conveniently stored in dead-as-a-door-nail cell phone). When I reached the shuttle area, I figured what the heck -- I have no change anyway, so let's see if I can convince the next shuttle to take me. In this I was successful. And I made it home an hour ahead of my original schedule! Whoopie! Party at my house! You can't miss it!

August 14, 2009

Balisage 2009 - XForms and Genericode at NARA


On Thursday, August 13th, Quyen L. Nguyen (National Archives and Records Administration) and Betty Harvey, (Electronic Commerce Connection) presented their paper entitled Agile Business Objects Management Application for Electronic Records Archive Transfer Process. [Submitted Paper]. U.S. National Archives and Records Administration (NARA) processes a very high volume of documents from most government agencies in their Electronic Records Archive (ERA), designed for long term preservation and access to digital objects. Quyen Nguyen explained that ERA has many challenges including dealing with a number of different media types (data types) and an ever-increasing volume of submissions. Archival Business Object requirements include the standard CRUD capabilities plus Versioning and Searching. NARA made the decision years ago to store documents in XML and transform to PDF. XML is used at business, communication and storage levels.

Management of Authority Lists (aka controlled vocabularies, aka code lists) is a big issue for NARA and other agencies. List changes should not require coding changes or re-compiling. NARA submission forms have fields that are conditionally optional or required, with inter-dependency between fields. Depending on state, some fields need to be insensitive to input and other fields need to be displayed or hidden.

The traditional HTML with JSP approach is too subject to code list changes. Schema changes cause a recompile of JSP code. It is harder to programmatically determine when to validate data input. Use of xsd:enumeration and annotations is inadequate for representing complex, multi-column code lists.

In contrast, according to Betty Harvey, XForms offer NARA many benefits such as modularity, reuse, separate evolvability, consistency of error messages, data integrity, performance, easier data exchange as XML, etc. The ERA team has implemented a comprehensive XForms solution by leveraging genericode from the OASIS Code List Representation Technical Committee. Their solution which includes an Orbeon XForms server provides an intuitive archive submission authoring system. The verbose genericode files is processed into smaller “fat free” versions via a custom XSLT. User form interactions can control dynamic form changes, such as which code list to display, fields that appear or are hidden, and so on. XForms Bind is pretty powerful and permit variables in XPath expressions.

Use of genericode in particular and, to a lesser extent, XForms are of interest to me personally so I know I’ll be reading their paper. Betty Harvey has also made her XForm Controls examples available. She also prepared her slides in XML using XSLT to conform to the W3C Slidy presentation library.

August 13, 2009

Balisage 2009 - Stream and Scream

This is the promised Part 2 of the blog entry about Mike Kay’s XML/XSLT processing optimization talks from August 12th. His second talk, entitled XSLT Screaming in XSLT 2.1 - Saxon-EE 9.2, was actually an impromptu. Kay gave us an unofficial preview of XSLT 2.1, which isn’t yet a public working draft from the W3C. Despite only the change in minor version number, we learned that the changes in 2.1 will be substantial.

Mike defined XSLT streaming as processing source documents without building a tree in memory, making it possible to handle much larger documents and reducing latency. Apparently, implementors haven’t taken advantage of streaming yet. The new XSLT specification will define a subset of the language that is streamable (presumably like the XProc spec does). Boldface is used to highlight new XSLT instructions or attributes below.
  • xsl:stream href=”uri”
  • xsl:mode streamable=”yes” name=”stream1”
  • xsl:template match=... mode=”stream1”
Exactly what is streamable within a template will be defined. For example, no sorting, no sideways navigation, only one download selection, no ancestor::x.child::y, etc.

Other new XSLT instructions include:
  • xsl:iterate - syntax like xsl:for-each, but with semantics of tail recursion. For example, if you are producing a document with all bank transactions, it could be generated with a running total of balance. You can pass parameters to next iteration using xsl:next-iteration and xsl:with-param.
  • xsl:merge - merge multiple streamed input files; also
  • xsl:merge-source, xsl:merge-input, xsl:merge-key, xsl:merge-action
  • xsl:copy-of and xsl:snapshot - retains ancestors and attributes
Specifically with regard to SAXON-EE 9.2, Kay highlighted the following functions and instructions:
  • saxon:stream() function -- xsl:stream, mainly for documents larger than physical memory; lazy evaluation
  • saxon:iterate -- helpful as an alternative to recursion that some programmers can understand more easily
  • saxon:mode streamable=”yes”, but presently with only a subset of the XSLT 2.1 use cases implemented
If anyone caught instructions or details I missed, feel free to add comments below.

Balisage 2009 - Pull, Push, Stream and Scream


On August 12, Mike Kay (Saxonica) presented two back-to-back topics related to XML/XSLT pipeline processing optimization. The first talk, You pull, I’ll push: On the polarity of pipelines, [Submitted Paper] compared and contrasted the control flow in the pipeline, which can run either with the data flow ("push") or against it ("pull"). That is, in “push”, control flow and data flow in the same direction, whereas in “pull”, control flow and data flow in opposite directions.In the main loop, data is pulled on input and then pushed. Kay discussed other combinations, such as fully streamable case of pull, pull, control, push, push pipelines. In branch and merge pipelines, pull is needed for multiple inputs, whereas push is needed for multiple outputs. Schema validation in Saxon is written in push style because it forks. This led Kay to say there is no clear winner between push and pull; each is appropriate in different situations.

Mike Kay’s paper discusses various combinations and approaches such as the other “JSP” (Jackson Structured Programming), the concept of inversion, and coroutines, which involve multiple stacks in a single thread; 2 programs are written as if they each own the control loop. Kay relates these concepts to XSLT processors and concludes:
As the usage of XML increases and more and more users find themselves applying languages like XSLT and XQuery to multi-gigabyte datasets, a technology that can remove the problems caused by pipeline polarity clashes has great potential.


[Will add his second talk here when I'm not so sleepy.]

August 12, 2009

Balisage 2009 - GODDAGS and EARMARKS, Just Ducky And We Love It


Fabio Vitali both figuratively and literally gave an animated talk addressing the problem of overlapping markup and the problem of modeling documents as trees. The title of his presentation, Towards markup support for full GODDAGs and beyond: the EARMARK approach, does little to convey how entertaining he made the subject. Let's just say he didn't duck and run for cover.

The fact that Vitali used a song by my all-time favorite band, the Fab Four, certainly got my attention. And I love it! His case study was a karaoke application which he postulated poses interesting markup challenges. First, the selected song requires pronoun changes based on the gender of the singer. Lines are displayed twice for a one-line lookahead. Chord changes do not exactly match line changes. And the final challenge is embedded fun facts that popup at appropriate points in the song.

Vitali's paper discusses his approach to these challenges -- EARMARK (Extreme Annotational RDF Markup), an OWL ontology with RDF triples. See his Submitted Paper and also his EARMARK site. See also the earlier work by C. M. Sperberg-McQueen and Claus Huitfeldt, GODDAG: A Data Structure for Overlapping Hierarchies.

Balisage 2009 - Streamabilty of XProc Pipelines


Norm Walsh (Mark Logic) gave a talk on streamability of XProc pipelines. XProc lets users define a sequence of atomic operations to apply to a series of documents, using control structures similar to conditionals, iteration, and exception handlers. XProc: An XML Pipeline Language is presently a W3C Candidate Recommendation that is near and dear to Norm since he’s been working on it for awhile. He hinted it should become a Recommendation this fall or certainly by Christmas. As per W3C policy, there must be 2 implementations before a specification is finalized. One of those implementations is by Walsh himself, called XML Calabash which is built on Saxon 9.

Streaming would provide a sliding window in a single pass with output beginning before all input has been seen. Little in said about streaming in the spec, but it is clear it could improve end-to-end performance in certain situations and would be essential for processing documents larger than physical memory. Although there are no explicit requirements for steps to be streaming in the spec, implementations will add value by enabling this.

Norm indicated that certain XProc instructions such a p:count are streamable, wheras others such as p:exec, p:http-request, p:validate-with-relaxng, p:validate-with-schematron, p:validate-with-xml-schema, p:xquery, and p:xslt cannot be streamable. His paper discusses data he collected collected by XML Calabash between 21 Dec 2008 and 11 Jul 2009 representing more than 294,000 pipeline runs. (His implementation has an opt-out, phone home feature so he can collect certain usage data.) In his Submitted Paper, Walsh concluded:
The preliminary analysis performed when this paper was proposed suggested that less than half “real world” pipelines would benefit from a streaming implementation.
The data above seems to indicate that the benefits may be considerably larger than that. Although it is clear that there are pipelines for which streaming wouldn't offer significant advantages, it's equally clear that for essentially any set of pipelines of a given length, there are pipelines which would be almost entirely streamable.
Perhaps the most interesting aspect of this analysis is the fact that as pipeline runs grow longer, they appear to become more and more amenable to streaming. That is to say, it appears that a pipeline that runs to 300 steps is, on average, more likely to benefit from streaming than one that's only 100 steps long. We have not yet had a chance to investigate why this is the case.

Balisage 2009 - Beer and Demo


Q: What is the preferred accompaniment for demos at a technical conference?
A: Why, beer and free food, of course! (Save your divergent opinions about that statement, guys!)

On August 11th, Mark Logic was generous enough to provide great quantities of liquid and solid nourishment at the Brewtopia pub in Montreal. The demo format was simple: 5 minutes each to plugin and go. Over a dozen eager folks braved the cramped space and a hot room, not to mention an increasingly rowdy audience (funny how beer contributes to that). The contestants and the names of their demos follow:
  • Micah Dubinko: Zero to App in 5 minutes
  • Michael Sokolov: Bibilical Studies
  • Bruce Bauman: Conceptual Models to XML Schema
  • Josh Lubell: Quality of Design
  • Mohamed Zergaoui: XML Prague and XProc Designer
  • Uche Ogbuji: Freemix
  • Markos Z(?): XQuery in the Browsercreain
  • David Lee: One-Line Web Server
  • Quinn Dombrowski: Visualizing Bulgarian Dialect Data
  • Betty Harvey: Archival Description (NARA)
  • Steve Newcomb: IEML Parser
  • John Snelson: Higher Order Functions in XQuery 1.1
[Also, Vyacheslav Zholudev volunteered to demo Presentational OMDoc but unfortunately couldn't get a working laptop in the allotted time.]

Betty Harvey (Electronic Commerce Connection, Inc.) and Quinn Dombrowski received identical cheers twice in succession (as measured by the highly scientific decibel meter) so they were declared co-winners, splitting the cash prize. By sheer coincidence, they were the only two female demonstrators. Draw your own conclusions. Everyone who participated was awarded a Mark Logic t-shirt.

Thanks to Mark Logic, especially host Norm Walsh and his colleagues, for a fun and "educational" evening. Thanks also the Brewtopia wait staff who had to wend their way through the tightly packed crowd all night.

Balisage 2009 - Spicy XML Data Services Platform

According to Uche Ogbuji, Akara is an integration platform for data services over the web providing pipelines for managing and processing data in whatever form (one format to another). In his own words:
Akara is an open-source XML/Web mashup platform supporting XML processing in an environment of RESTful data services. It includes “Web triggers”, which build on REST architecture to support orchestration of Web events. This is a powerful system for integrating services and components across the Web in a declarative way, so that perhaps a Web request could access information from a service running on Amazon EC2 to analyze information gathered from social networks, run through a remote spam detector service. Akara is designed from ground up to support such rich interactions, using the latest conventions and standards of the Web 2.0 era. It's also designed for performance, modern processor conventions and architectures, and for ready integration with other tools and components.

I have to admit, Uche Ogbuji's talk entitled Akara - Spicy Bean Fritters and an XML Data Services Platform was difficult for me to follow. Probably to many in the audience who have closely followed his earlier 4Suite work, this was exactly the right degree of spiciness, but it gave me indigestion. The pace was very fast (partly because he thought he had less time then actually allotted) and the slides were replete with acronyms and terminology that he assumed everyone understood (which may be the case, but still...).

Akara is built upon a mature foundation, namely the 4Suite code base including a port of the test suite. Uche said Akara is being used in some production environments although he also mentioned it was technically alpha code.

See his (late-breaking news) Submitted Paper.

Balisage 2009 - Those Pesky Namespaces!


Liam Quin (W3C's XML Activity Lead) gave a highly spirited talk on the pros and cons on XML Namespaces, as well as several approaches to simplifying namespace specification. He mentioned solutions proposed by Tim Bray, Micah Dubinko, and Ian Hickson. Quin's own solution was to store namespace declarations in a special namespace file that is processed by XSLT and applied to the files that reference it so they can in turn be validated with normal namespace syntax.

Apparently Liam must be into multimedia presentations. In addition to sporting a very colorful hat, he read a passage from any old book about railroads and showed a video clip of a damsel in distress tied to the tracks. Unfortunately, he ran out of time before we learned the fate of the damsel.

See his Submitted Paper.

August 11, 2009

Balisage 2009 - XML in the Browser: The Next Decade

Alex Milowski (Appolux) reminded us of the earliest demo on XML in a browser -- Netscape’s 1999 XML book demo from XTech ’99 in which you could sort by author, title, or ISBN using a combination of XML, HTML, CSS and JavaScript. At that time, Netscape also had an IRS demo with a table of contents in a sidebar controlling which page is presented (a la JavaDocs). While this might seem like old hat to us in 2009, Milowski ran the demos in recent browsers. The book demo worked in Firefox 3.x, Safari, Android, and iPhone, but failed in Internet Explorer 6, 7, and even 8. The IRs demo was less successful across the board with the exception of Firefox.

He defined Intrinsic Vocabulary as any markup that a browser can natively process with some well-defined non-trivial semantic without the aid of additional constructs. HTML is an example, but XML is not. He is particularly interested in intrinsic support for HTML5, SVG, and MathML, so he created a Firefox extension called XML Application Launcher. After you install the add-on, you can view Alex’s Balisage paper directly as XML in the browser using the Balisage DocBook subset rendered with a popup table of contents with load-on-demand pages. I tried it tonight and it works just fine! on the Google code page, he wrote:
The main idea of this extension is that you can write your own applications, distribute them, and use this tool to launch them based on media type, XML namespace, content matching, or some combination of those three. Eventually the extension will have access to a registry of applications for XML vocabularies so that when an unknown type is encountered it can query for supporting applications.
Milowski concluded his talk with these points:
  • We must have HTML5, SVG, and MathML.
  • Embrace the idea of intrinsic vocabularies.
  • Replicate the browser extension model.
  • Support open-source and make it easy to use.
  • Don’t wait for someone else to implement it.
See the Submitted Paper.

Balisage 2009 - Opening Remarks and Sponsors


The Balisage 2009 Conference Committee -- B. Tommie Usdin (chair), Deborah A. Lapeyre, James David Mason, Steven R. Newcomb, C. M. Sperberg-McQueen -- opened the 4-day XML conference. One of the well-received announcement was their determination to make Balisage conference proceedings persistent. Unlike some other XML conferences which shall remain nameless (but not blameless), you’ll always be able to find all papers in the series. An ISBN has been assigned to each volume (2 per year), the entire series has an ISSN, and each individual paper has its own DOI (digital object identifier). How cool! Thank you, Mulberry Technologies!

The co-chairs acknowledged the two main sponsors: Mark Logic and the FLWOR Foundation. The FLWOR Foundation is dedicated to providing middleware and clients to simplify the use of XQuery. They have 3 open source projects under an Apache license:Zorba - XQuery processor, XQuery 1.1, update facility, scripting and REST extensions; XQIB (XQuery in the browser) is a browser plugin for Internet Explorer which allows execution of client-side XQuery to navigate and update the DOM; and an Eclipse plugin (XQVT?).

Of course, we all know Mark Logic because they've given us: MarkLogic Server, a native XML database that implements XQuery for the CRUD functionality with full-text and structured search; MarkLogic Application Services which includes Application Builder provides an intuitive, browser-based user interface for creating applications without writing XQuery code; and MarkMail.org, a public email search site built using Mark Logic App Builder which currently archives over 40 million searchable emails.

And Mark Logic is now also known for sponsoring a “Beer and Demo” (more about that later).

And let’s not forget those cool ergonomic pens donated by Patrick Russo (sp?). Don’t confuse them with a tuning fork or wishbone ;-)

Balisage, Come to Me!


Kicking off the conference today (after the logistics, of course), James Mason surprised Tommie Usdin with a song about Balisage. Not sure of the official title (could be simply "Balisage"), but it was set to the tune of Bali Ha'i from South Pacific. As can be seen from the photo, conference chair Usdin was delightfully surprised. Nice job, James et al!

Check back here for the lyrics...

August 10, 2009

Balisage 2009 - Review and Summary: Processing XML Efficiently


Mike Kay summarized and reviewed the symposium and added his own observations. He was a little surprised that there was nothing about Binary XML and little about speeding up XSLT or XQuery. Kay made the following main points:

Performance against other objective:

  • Saxon: first standard conformance, then usability, then performance
  • performance is not always the most important consideration
  • David Wheeler (invented the subroutine)- “Optimize the code that users actually write”.
  • Should you optimize for Joe User or the expert?

Performance metrics:

  • Response time / latency
  • throughout
  • resource cost
  • scaleability
  • Jim Robinson: “Good enough for us” -- your own requirements, not the world’s best solution

Performance methodology:

  • measure
  • understand - figure out where bottlenecks are and what can be done to improve it
  • focus on critical components
  • improve -but if not, don’t leave in the things that didn’t help
  • repeat until ok - what counts as ok

Where are the bottlenecks?

  • application-level glue
  • parsing vs. query?
  • validation?
  • conversion to/from XML?
  • serialization?
  • query/transformation?
  • too many phases in the pipeline?


What improvements can we expect?


  • faster parsing has ben promised?
  • faster and more scalable transformation
  • optimized pipelines?
  • smarter users?
  • faster hardware

Balisage 2009 - Efficient Scripting

David Lee, Epocrates (with a writing assist from Norman Walsh, Mark Logic) discussed Efficient Scripting (Submitted Paper). Scripting is useful for splitting complex workflows into manageable tasks that are easier to debug and easier to develop than with programming languages; the piece part solutions can be glued together later. Lee devised 5 kinds of test cases (2 were baselines) for 4 scripting languages: DOS and bash (Cygwin implementation), xmlsh and xproc (calabash is Norm Walsh’s early XProc implementation). The 2 XML scripting languages performed significantly better than the other two. With approximately 600 XSLT and XQuery operations, xmlsh and xproc yielded 50-75 fold improvement. With 3,000 operations, there was a 150-fold improvement. I believe Lee said there was a 150-200-fold improvement when there were approximately 30,000 operations. It should be noted that the chosen test cases focus on areas where scripting performs worst.

Balisage 2009 - Symposium on Processing XML Efficiently


Michael Kay kicked off the all-day International Symposium on Processing XML Efficiently by explaining that the morning would focus mainly on bottom-up hardware-based solutions up to pipeline processing, whereas the afternoon would be more top-down from the application programmer’s viewpoint.

The first presentation was by Michael Leventhal & Eric Lemoine, LSI Corporation, and covered an XML chip in development through this decade. The tokenizer developed by Tarari was originally based on the DOM which didn’t yield vast improvements. By 2004, a new approach called RAX (Random Access XML, simultaneous XPath) proved highly acceleratable, with 40x faster performance than conventional DOM-based. By 2005-6, this work became XMachine, consisting of a Rava stack, XSLT processor, and XTM (XML Threat Management) for streaming large documents. LSI acquired Tarari in 2007. By 2008, the XML chip was selected for HP SAP accelerator, yielding 3x to 4x improvement for enterprise applications. XMachine 2.0 is the 2009 version. The key to this hardware solution is providing a software stack above the hardware, the goal being to accelerate the overall software applications, so XMachine offers of an optimized API. The main design decision for hardware acceleration, they contend, is throughput vs. latency. The field-programmable gate array (FPGA) was a game-changer for their work. LSI has obtained performance numbers of 4.9 Gbps for complex XML processing with documents as large as 4 GB. They believe their new design will accommodate 1 TB documents. According to their Submitted Paper:
Results show that the use of an XML co-processor can reduce CPU cycles per byte of XML processed by amounts ranging from a factor of 3 to a factor of 50 depending on the workload, while power consumption can be reduced by a factor of 7.


One of LSI’s competitor's, DataPower, presented next. David Maze discussed the IBM DataPower XML XG4 processor card, the goal of which is to accelerate throughput and providing parallel processing with a card that can be switched to different processors. Their main use case is SOAP traffic in, access control checking, threat checking, and ensuring output is safe for the consumer of the messages. IBM uses a post-processing engine (PPE) that performs XPath evaluation, schema validation, XML routing, and filtering in hardware (i.e., to reject dangerous XML). DataPower has achieved throughput (and schema validation) up to 2 Gbps with up to 64K simultaneous documents and 128K simultaneous XPath expressions with as little as 15W of power. The XG4 has three kinds of output: DOM-like tree for the SOAP header, a SAX-like stream for the SOAP body called TLA (sorry, it’s just another Three Letter Acronym to me ;-), and post-processed data (pass, fail, status). Their PPE approach enables application-specific optimizations and extensions for new standards. See their Submitted Paper (abstract only as of this blog).

August 09, 2009

Balisage 2009 Blog - Introduction


Welcome to the beginning of my blogs for Balisage Conference 2009 in beautiful centre-ville Montreal. Check back daily for highlights of this technical XML conference. For each formal presentation referenced in this blog, the keywords Submitted Paper will link to the relevant publication in the conference proceedings. According to the proceedings:
Balisage is a peer-reviewed conference designed to meet the needs of markup theoreticians and practitioners who are pushing the boundaries of the field. It's all about the markup: how to create it; what it means; hierarchies and overlap; modeling; taxonomies; transformation; query, searching, and retrieval; presentation and accessibility; making systems that make markup dance (or dance faster to a different tune in a smaller space) — in short, changing the world and the web through the power of markup.

See also the Extreme Markup Community site.

Photo: Hotel Europa, Second Floor Entrance to Conference (credit: Ken Sall, c'est moi)

August 07, 2009

Montreal Tomorrow


I should be packing. I should be going to sleep early. But instead I'm messing around with my Mac. Leaving at O'Dark Thirty for Montreal. Really looking forward to getting out of town and into a different culture. Here's hoping that Balisage Conference 2009 will be as good as it sounds!

XML is Just a Three-Letter Word


Well, although XML has been around officially for 11.5 years now, it is still a controversial subject. The XML Wikipedia article is undergoing a major re-write, led by Tim Bray and Michael Kay with a cast of thousands. Would you believe, a handful? Check out the huge XML Category on Wikipedia. A bunch of XML geeks including Yours Truly will be headed for Balisage Conference 2009 in Montreal next week.

August 03, 2009

McCartney Mania, Redux!

The more you look, the more you find! More links to satiate McCartney fans:

Washington Post review of the concert: He Hopes You Have Enjoyed the Show (Paul McCartney, Still a Performer Without Peer ).

Another review with photos: Click Click: Paul McCartney @ FedEx Field

Flickr photos from the McCartney FedEx concert (all rights reserved, sadly).

Plenty of YouTube videos, here and here.

Here's a major find: Lyrics to every McCartney, Wings, and Beatles song!

August 02, 2009

McCartney Mania!


On August 1, 2009, I was fortunate enough to see Paul McCartney up close (well, not that close) at FedEx Field in Landover, MD last night. What a fabulous Washington area show! This was only my second time(*) seeing Sir Paul, the first being in Oct. 2005. [Thanks to Donna W. and Bob M. respectively.]

The show kicked off soon after dark with a hard rockin' version of Drive My Car, the first of 22 Beatles songs. Naturally the 2.5 hour show included a small sampling of Wings tunes (Jet, Let Me Roll It, My Love, Mrs Vanderbilt [ho, hey ho], Band on the Run). There were a bunch post Wings performances (Only Mama Knows, Flaming Pie, Highway, Here Today, Dance Tonight, Calico Skies, Sing the Changes). Highway and Sing the Changes are from the recent Electric Arguments album by the The Firemen. With the exception of Dance Tonight and Here Today, this period is not the highpoint of Paul's career, IMHO. [Not everyone would agree with this statement, right, Frank M?]

Relatively early in the evening, Paul added a fan favorite not played in NYC, dedicated to our First Lady, Michelle. (Hey, Paul, ya gotta work on the pronunciation of "Barack" or he won't invite you to the White House.) This was soon followed by a dedication of Here Today to John about whom the song was written (from the Tug of War album from 1982). Later in the set, he dedicated Something to George, playing a ukulele that Harrison had given him.

When Paul introduced Blackbird, a solo from The Beatles (The White Album), he said it was written in response to the 1960s Civil Rights movement, a total surprise to yours truly. Turns out this is connected to Paul's book of poetry (song lyrics) called Blackbird Singing from 2002.

Both Frank M. and John T. noticed a bit of Jimi Hendrix, which turned out to be Foxey Lady, added as an instrumental portion to Let Me Roll It. Seems like they do this often judging for past set lists.

Another highpoint of the evening was the auditory and visual spectacular of fireworks during the #2 single Live and Let Die. While this is certainly nothing new to Paul's gigs, it is definitely impressive live, especially when you are relatively close to the stage. The sounds of exploding pyrotechnics and the acrid smell of smoke were quite powerful.

Understandably for someone who has a whopping 22 solo/Wings studio albums (plus a half dozen live albums and a number of other projects), there were lots of albums from which no songs at all were performed. Hard to believe though that there was nothing from the first McCartney album, my second favorite only to Band on the Run.

Maybe it is my personal bias, but as excited as people were about Paul's Wings and solo material, the loudest reaction seemed to be from Beatles tunes which constituted nearly 2/3 of the show. There were two high energy encores comprised entirely of Beatles songs (EC 1: Day Tripper, Lady Madonna, I Saw Her Standing There; EC 2: Yesterday (solo), Helter Skelter, Get Back, Sgt. Pepper's LHCB (repise), and The End). In fact, if you count the encores, the entire second half of the set list (songs 19 to 35) was nothing but Beatles classics! Yeah, yeah, yeah!

Check out the complete Washington set list is available with lyrics and videos (various sources) for every one of the 35 songs. YouTube has a bunch of videos of varying quality from the FedEx show.

PS - The photo is from 2004 in Prague. Free to use according to Wikipedia.

And now for something completely different: Left 4 Dead - The Beatles. And you thought The Beatles Rock Band was going to be the best Beatles videogame?

(*) Never mind that I've seen 1964 The Tribute (Beatles tribute band) 10 times!