Showing posts with label balisageConference. Show all posts
Showing posts with label balisageConference. Show all posts

August 19, 2009

Balisage 2009 - Best Practices - or Not?


{If it could talk, this blog entry would be begging for your comments via the "comments" link below the article.}

On August 13th, the Balisage Conference 2009 hosted a spirited panel discussion featuring David Chesnutt, Chet Ensign, Betty Harvey, Electronic Commerce Connection, Laura Kelly, National Library of Medicine, and Mary McRae, OASIS. Harvey shared her "Top 10 Mistakes in DTDs" from 1998, most of which are still applicable today. We learned a bit about the standardization process at OASIS from Mary McRae, who differentiated between the codified rules which are enforced, informative guidelines which are essentially only recommendations, and very informal oral exchange of practices (which she emphasized should be recorded in written form).

McRae mentioned the OASIS Technical Committee Process which, among many other things, defines the OASIS Naming Guidelines in two parts: Part 1: Filenames, URIs, Namespaces and Part 2: Metadata and Versioning. For example, OASIS requires a RDDL-like file at the URI given for a namespace. This is in accord the W3C’s Architecture of the World Wide Web “good practice” for namespace documents:
The owner of an XML namespace name SHOULD make available material intended for people to read and material optimized for software agents in order to meet the needs of those who will use the namespace vocabulary.
McRae mentioned Adoption Technical Committees, I believe in reference to DITA Help, DITA Localization, UDDI, SAML, and OpenDocument Format (ODF). She also indicated that all OASIS specifications are either published in DocBook, Word, OpenOffice, DITA or XHTML.

An issue that generated divergent opinions was the question of which version of an XML Schema is the normative one -- the version stored in a separate file that can be validated and directly used, or the version that has been pasted into a Word document or other word processing format? Of course, the two should be identical and ideally pulled from the same source (i.e., the schema file could be imported into the document), but this isn’t always the case. The separate has the advantage of lending itself to code review, whereas the version embedded in a specification that is normative gives the impression that it too is normative.

Does imposing best practices on developers restrict creativity and productive competition? Are Naming and Design Rules (NDRs) inherently evil? Again, this question solicited differing opinions, although it seems that the majority who offered comments were against such impositions.

Laura Kelly's experience at the US National Library of Medicine taught her that the scope of any project is smaller than we like to admit, so perhaps draconian rules are counter-productive. When it comes to XML encoding, focus on giving the users what they should be asking for -- how to markup data so it will work best in their systems. Developers really only want to know "have I tagged my data correctly?" and "what does the data look like?"

David R. Chesnutt offered some lessons learned from his SGML work with the Model Editions Partnership (MEP) circa 1997. This project used a "subset of the SGML markup system developed by the Text Encoding Initiative (TEI)".

{Post comments below.}

August 14, 2009

Balisage 2009 - XForms and Genericode at NARA


On Thursday, August 13th, Quyen L. Nguyen (National Archives and Records Administration) and Betty Harvey, (Electronic Commerce Connection) presented their paper entitled Agile Business Objects Management Application for Electronic Records Archive Transfer Process. [Submitted Paper]. U.S. National Archives and Records Administration (NARA) processes a very high volume of documents from most government agencies in their Electronic Records Archive (ERA), designed for long term preservation and access to digital objects. Quyen Nguyen explained that ERA has many challenges including dealing with a number of different media types (data types) and an ever-increasing volume of submissions. Archival Business Object requirements include the standard CRUD capabilities plus Versioning and Searching. NARA made the decision years ago to store documents in XML and transform to PDF. XML is used at business, communication and storage levels.

Management of Authority Lists (aka controlled vocabularies, aka code lists) is a big issue for NARA and other agencies. List changes should not require coding changes or re-compiling. NARA submission forms have fields that are conditionally optional or required, with inter-dependency between fields. Depending on state, some fields need to be insensitive to input and other fields need to be displayed or hidden.

The traditional HTML with JSP approach is too subject to code list changes. Schema changes cause a recompile of JSP code. It is harder to programmatically determine when to validate data input. Use of xsd:enumeration and annotations is inadequate for representing complex, multi-column code lists.

In contrast, according to Betty Harvey, XForms offer NARA many benefits such as modularity, reuse, separate evolvability, consistency of error messages, data integrity, performance, easier data exchange as XML, etc. The ERA team has implemented a comprehensive XForms solution by leveraging genericode from the OASIS Code List Representation Technical Committee. Their solution which includes an Orbeon XForms server provides an intuitive archive submission authoring system. The verbose genericode files is processed into smaller “fat free” versions via a custom XSLT. User form interactions can control dynamic form changes, such as which code list to display, fields that appear or are hidden, and so on. XForms Bind is pretty powerful and permit variables in XPath expressions.

Use of genericode in particular and, to a lesser extent, XForms are of interest to me personally so I know I’ll be reading their paper. Betty Harvey has also made her XForm Controls examples available. She also prepared her slides in XML using XSLT to conform to the W3C Slidy presentation library.

August 13, 2009

Balisage 2009 - Stream and Scream

This is the promised Part 2 of the blog entry about Mike Kay’s XML/XSLT processing optimization talks from August 12th. His second talk, entitled XSLT Screaming in XSLT 2.1 - Saxon-EE 9.2, was actually an impromptu. Kay gave us an unofficial preview of XSLT 2.1, which isn’t yet a public working draft from the W3C. Despite only the change in minor version number, we learned that the changes in 2.1 will be substantial.

Mike defined XSLT streaming as processing source documents without building a tree in memory, making it possible to handle much larger documents and reducing latency. Apparently, implementors haven’t taken advantage of streaming yet. The new XSLT specification will define a subset of the language that is streamable (presumably like the XProc spec does). Boldface is used to highlight new XSLT instructions or attributes below.
  • xsl:stream href=”uri”
  • xsl:mode streamable=”yes” name=”stream1”
  • xsl:template match=... mode=”stream1”
Exactly what is streamable within a template will be defined. For example, no sorting, no sideways navigation, only one download selection, no ancestor::x.child::y, etc.

Other new XSLT instructions include:
  • xsl:iterate - syntax like xsl:for-each, but with semantics of tail recursion. For example, if you are producing a document with all bank transactions, it could be generated with a running total of balance. You can pass parameters to next iteration using xsl:next-iteration and xsl:with-param.
  • xsl:merge - merge multiple streamed input files; also
  • xsl:merge-source, xsl:merge-input, xsl:merge-key, xsl:merge-action
  • xsl:copy-of and xsl:snapshot - retains ancestors and attributes
Specifically with regard to SAXON-EE 9.2, Kay highlighted the following functions and instructions:
  • saxon:stream() function -- xsl:stream, mainly for documents larger than physical memory; lazy evaluation
  • saxon:iterate -- helpful as an alternative to recursion that some programmers can understand more easily
  • saxon:mode streamable=”yes”, but presently with only a subset of the XSLT 2.1 use cases implemented
If anyone caught instructions or details I missed, feel free to add comments below.

Balisage 2009 - Pull, Push, Stream and Scream


On August 12, Mike Kay (Saxonica) presented two back-to-back topics related to XML/XSLT pipeline processing optimization. The first talk, You pull, I’ll push: On the polarity of pipelines, [Submitted Paper] compared and contrasted the control flow in the pipeline, which can run either with the data flow ("push") or against it ("pull"). That is, in “push”, control flow and data flow in the same direction, whereas in “pull”, control flow and data flow in opposite directions.In the main loop, data is pulled on input and then pushed. Kay discussed other combinations, such as fully streamable case of pull, pull, control, push, push pipelines. In branch and merge pipelines, pull is needed for multiple inputs, whereas push is needed for multiple outputs. Schema validation in Saxon is written in push style because it forks. This led Kay to say there is no clear winner between push and pull; each is appropriate in different situations.

Mike Kay’s paper discusses various combinations and approaches such as the other “JSP” (Jackson Structured Programming), the concept of inversion, and coroutines, which involve multiple stacks in a single thread; 2 programs are written as if they each own the control loop. Kay relates these concepts to XSLT processors and concludes:
As the usage of XML increases and more and more users find themselves applying languages like XSLT and XQuery to multi-gigabyte datasets, a technology that can remove the problems caused by pipeline polarity clashes has great potential.


[Will add his second talk here when I'm not so sleepy.]

August 12, 2009

Balisage 2009 - GODDAGS and EARMARKS, Just Ducky And We Love It


Fabio Vitali both figuratively and literally gave an animated talk addressing the problem of overlapping markup and the problem of modeling documents as trees. The title of his presentation, Towards markup support for full GODDAGs and beyond: the EARMARK approach, does little to convey how entertaining he made the subject. Let's just say he didn't duck and run for cover.

The fact that Vitali used a song by my all-time favorite band, the Fab Four, certainly got my attention. And I love it! His case study was a karaoke application which he postulated poses interesting markup challenges. First, the selected song requires pronoun changes based on the gender of the singer. Lines are displayed twice for a one-line lookahead. Chord changes do not exactly match line changes. And the final challenge is embedded fun facts that popup at appropriate points in the song.

Vitali's paper discusses his approach to these challenges -- EARMARK (Extreme Annotational RDF Markup), an OWL ontology with RDF triples. See his Submitted Paper and also his EARMARK site. See also the earlier work by C. M. Sperberg-McQueen and Claus Huitfeldt, GODDAG: A Data Structure for Overlapping Hierarchies.

Balisage 2009 - Streamabilty of XProc Pipelines


Norm Walsh (Mark Logic) gave a talk on streamability of XProc pipelines. XProc lets users define a sequence of atomic operations to apply to a series of documents, using control structures similar to conditionals, iteration, and exception handlers. XProc: An XML Pipeline Language is presently a W3C Candidate Recommendation that is near and dear to Norm since he’s been working on it for awhile. He hinted it should become a Recommendation this fall or certainly by Christmas. As per W3C policy, there must be 2 implementations before a specification is finalized. One of those implementations is by Walsh himself, called XML Calabash which is built on Saxon 9.

Streaming would provide a sliding window in a single pass with output beginning before all input has been seen. Little in said about streaming in the spec, but it is clear it could improve end-to-end performance in certain situations and would be essential for processing documents larger than physical memory. Although there are no explicit requirements for steps to be streaming in the spec, implementations will add value by enabling this.

Norm indicated that certain XProc instructions such a p:count are streamable, wheras others such as p:exec, p:http-request, p:validate-with-relaxng, p:validate-with-schematron, p:validate-with-xml-schema, p:xquery, and p:xslt cannot be streamable. His paper discusses data he collected collected by XML Calabash between 21 Dec 2008 and 11 Jul 2009 representing more than 294,000 pipeline runs. (His implementation has an opt-out, phone home feature so he can collect certain usage data.) In his Submitted Paper, Walsh concluded:
The preliminary analysis performed when this paper was proposed suggested that less than half “real world” pipelines would benefit from a streaming implementation.
The data above seems to indicate that the benefits may be considerably larger than that. Although it is clear that there are pipelines for which streaming wouldn't offer significant advantages, it's equally clear that for essentially any set of pipelines of a given length, there are pipelines which would be almost entirely streamable.
Perhaps the most interesting aspect of this analysis is the fact that as pipeline runs grow longer, they appear to become more and more amenable to streaming. That is to say, it appears that a pipeline that runs to 300 steps is, on average, more likely to benefit from streaming than one that's only 100 steps long. We have not yet had a chance to investigate why this is the case.

Balisage 2009 - Beer and Demo


Q: What is the preferred accompaniment for demos at a technical conference?
A: Why, beer and free food, of course! (Save your divergent opinions about that statement, guys!)

On August 11th, Mark Logic was generous enough to provide great quantities of liquid and solid nourishment at the Brewtopia pub in Montreal. The demo format was simple: 5 minutes each to plugin and go. Over a dozen eager folks braved the cramped space and a hot room, not to mention an increasingly rowdy audience (funny how beer contributes to that). The contestants and the names of their demos follow:
  • Micah Dubinko: Zero to App in 5 minutes
  • Michael Sokolov: Bibilical Studies
  • Bruce Bauman: Conceptual Models to XML Schema
  • Josh Lubell: Quality of Design
  • Mohamed Zergaoui: XML Prague and XProc Designer
  • Uche Ogbuji: Freemix
  • Markos Z(?): XQuery in the Browsercreain
  • David Lee: One-Line Web Server
  • Quinn Dombrowski: Visualizing Bulgarian Dialect Data
  • Betty Harvey: Archival Description (NARA)
  • Steve Newcomb: IEML Parser
  • John Snelson: Higher Order Functions in XQuery 1.1
[Also, Vyacheslav Zholudev volunteered to demo Presentational OMDoc but unfortunately couldn't get a working laptop in the allotted time.]

Betty Harvey (Electronic Commerce Connection, Inc.) and Quinn Dombrowski received identical cheers twice in succession (as measured by the highly scientific decibel meter) so they were declared co-winners, splitting the cash prize. By sheer coincidence, they were the only two female demonstrators. Draw your own conclusions. Everyone who participated was awarded a Mark Logic t-shirt.

Thanks to Mark Logic, especially host Norm Walsh and his colleagues, for a fun and "educational" evening. Thanks also the Brewtopia wait staff who had to wend their way through the tightly packed crowd all night.

Balisage 2009 - Spicy XML Data Services Platform

According to Uche Ogbuji, Akara is an integration platform for data services over the web providing pipelines for managing and processing data in whatever form (one format to another). In his own words:
Akara is an open-source XML/Web mashup platform supporting XML processing in an environment of RESTful data services. It includes “Web triggers”, which build on REST architecture to support orchestration of Web events. This is a powerful system for integrating services and components across the Web in a declarative way, so that perhaps a Web request could access information from a service running on Amazon EC2 to analyze information gathered from social networks, run through a remote spam detector service. Akara is designed from ground up to support such rich interactions, using the latest conventions and standards of the Web 2.0 era. It's also designed for performance, modern processor conventions and architectures, and for ready integration with other tools and components.

I have to admit, Uche Ogbuji's talk entitled Akara - Spicy Bean Fritters and an XML Data Services Platform was difficult for me to follow. Probably to many in the audience who have closely followed his earlier 4Suite work, this was exactly the right degree of spiciness, but it gave me indigestion. The pace was very fast (partly because he thought he had less time then actually allotted) and the slides were replete with acronyms and terminology that he assumed everyone understood (which may be the case, but still...).

Akara is built upon a mature foundation, namely the 4Suite code base including a port of the test suite. Uche said Akara is being used in some production environments although he also mentioned it was technically alpha code.

See his (late-breaking news) Submitted Paper.

Balisage 2009 - Those Pesky Namespaces!


Liam Quin (W3C's XML Activity Lead) gave a highly spirited talk on the pros and cons on XML Namespaces, as well as several approaches to simplifying namespace specification. He mentioned solutions proposed by Tim Bray, Micah Dubinko, and Ian Hickson. Quin's own solution was to store namespace declarations in a special namespace file that is processed by XSLT and applied to the files that reference it so they can in turn be validated with normal namespace syntax.

Apparently Liam must be into multimedia presentations. In addition to sporting a very colorful hat, he read a passage from any old book about railroads and showed a video clip of a damsel in distress tied to the tracks. Unfortunately, he ran out of time before we learned the fate of the damsel.

See his Submitted Paper.

August 11, 2009

Balisage 2009 - XML in the Browser: The Next Decade

Alex Milowski (Appolux) reminded us of the earliest demo on XML in a browser -- Netscape’s 1999 XML book demo from XTech ’99 in which you could sort by author, title, or ISBN using a combination of XML, HTML, CSS and JavaScript. At that time, Netscape also had an IRS demo with a table of contents in a sidebar controlling which page is presented (a la JavaDocs). While this might seem like old hat to us in 2009, Milowski ran the demos in recent browsers. The book demo worked in Firefox 3.x, Safari, Android, and iPhone, but failed in Internet Explorer 6, 7, and even 8. The IRs demo was less successful across the board with the exception of Firefox.

He defined Intrinsic Vocabulary as any markup that a browser can natively process with some well-defined non-trivial semantic without the aid of additional constructs. HTML is an example, but XML is not. He is particularly interested in intrinsic support for HTML5, SVG, and MathML, so he created a Firefox extension called XML Application Launcher. After you install the add-on, you can view Alex’s Balisage paper directly as XML in the browser using the Balisage DocBook subset rendered with a popup table of contents with load-on-demand pages. I tried it tonight and it works just fine! on the Google code page, he wrote:
The main idea of this extension is that you can write your own applications, distribute them, and use this tool to launch them based on media type, XML namespace, content matching, or some combination of those three. Eventually the extension will have access to a registry of applications for XML vocabularies so that when an unknown type is encountered it can query for supporting applications.
Milowski concluded his talk with these points:
  • We must have HTML5, SVG, and MathML.
  • Embrace the idea of intrinsic vocabularies.
  • Replicate the browser extension model.
  • Support open-source and make it easy to use.
  • Don’t wait for someone else to implement it.
See the Submitted Paper.

Balisage 2009 - Opening Remarks and Sponsors


The Balisage 2009 Conference Committee -- B. Tommie Usdin (chair), Deborah A. Lapeyre, James David Mason, Steven R. Newcomb, C. M. Sperberg-McQueen -- opened the 4-day XML conference. One of the well-received announcement was their determination to make Balisage conference proceedings persistent. Unlike some other XML conferences which shall remain nameless (but not blameless), you’ll always be able to find all papers in the series. An ISBN has been assigned to each volume (2 per year), the entire series has an ISSN, and each individual paper has its own DOI (digital object identifier). How cool! Thank you, Mulberry Technologies!

The co-chairs acknowledged the two main sponsors: Mark Logic and the FLWOR Foundation. The FLWOR Foundation is dedicated to providing middleware and clients to simplify the use of XQuery. They have 3 open source projects under an Apache license:Zorba - XQuery processor, XQuery 1.1, update facility, scripting and REST extensions; XQIB (XQuery in the browser) is a browser plugin for Internet Explorer which allows execution of client-side XQuery to navigate and update the DOM; and an Eclipse plugin (XQVT?).

Of course, we all know Mark Logic because they've given us: MarkLogic Server, a native XML database that implements XQuery for the CRUD functionality with full-text and structured search; MarkLogic Application Services which includes Application Builder provides an intuitive, browser-based user interface for creating applications without writing XQuery code; and MarkMail.org, a public email search site built using Mark Logic App Builder which currently archives over 40 million searchable emails.

And Mark Logic is now also known for sponsoring a “Beer and Demo” (more about that later).

And let’s not forget those cool ergonomic pens donated by Patrick Russo (sp?). Don’t confuse them with a tuning fork or wishbone ;-)

Balisage, Come to Me!


Kicking off the conference today (after the logistics, of course), James Mason surprised Tommie Usdin with a song about Balisage. Not sure of the official title (could be simply "Balisage"), but it was set to the tune of Bali Ha'i from South Pacific. As can be seen from the photo, conference chair Usdin was delightfully surprised. Nice job, James et al!

Check back here for the lyrics...