Apologies to my Business Intelligence readers, this post is pretty esoteric and refers to some strange goings on in the document standards world.

Apologies also to Rick Jelliffe, any resemblance to Schematron is entirely unintended.

After the drama of the initial Office 2007 SP2 ODF support was initiated by Rob Weir, related arguments spilled out all over the place.  I got involved in a few places, but then left it alone whilst I got on with some productive work.  There’s only so many ways you can say “it’s does not exist in the standard – admit it, fix it and move on”.

Needing something to accompany a coffee yesterday, I found that the argument was still raging on Alex Brown’s blog, so I settled down for some light entertainment.

Once again, Rob Weir was arguing with Alex about some interpretation of the ODF specification and calling Alex names.  Then ‘Marbux’  weighed in with possibly the two longest comments in internet history.

This highly esoteric, intellectual argument between Dr. Alex Brown (the Convenor of ISO/IEC JTC1 SC34 Working Group 1), Mr. Robert Weir (the Chair of the ODF Technical Committee at OASIS) and Mr. Paul E. Merrell J.D. AKA ‘Marbux’ (a Juris Doctor) essentially (and I simplify enormously) about whether something specified in the schema as required for a document instance to be valid/conformant, was not actually required, because Rob said it wasn’t and Alex and Marbux said it was.  Rob had his reasons for this, but they were completely lost on me.  I mostly understood Alex’s reasoning, but all this came at the cost of a Red Bull chaser after the coffee.

Note: I was always under the impression that it was good practice for something that was invalid against the schema to be deemed non-conformant.  Rob shows just what a absolute noob I am to assume that.

Here’s the selected highlights (I know it looks unexpurgated, but take my word for it)

Rob:

“Alex, where do you read that validity of manifest.xml is required for conformance? That is news to me. Section 1.5 defines document conformance and says that conformance requires validity with respect to the OpenDocument schema. Section 1.4 defines that schema to be that described in chapters 1-16. This is confirmed in Appendix A. The optional packing format, including manifest.xml, is described in chapter 17, which the astute reader will observe is not between 1 and 16. Therefor it is not part of the OpenDocument schema to which validity is required for conformance.”

Alex:

“Err, you’re reading the wrong version of the spec. The manifest invalidity problem only occurs for the two apps (OOo/Sun plugin) which claim to be using so-called "ODF 1.2"; so for the results here, the conformance section 1.5 of ODF 1.1, that you quote, is not relevant.”

Rob:

“You misread the standard. But rather than admit and accept that, you are now trying to redeem your post by a back-up argument, which tries to apply your interpretation and prediction of what Part 3 of ODF 1.2 will say when it is eventually approved. But this text has not even reached Committee Draft stage in OASIS. It is merely an Editor’s Draft at this point. Are you really going to persist in making bold claims like this based on an Editor’s Draft, which has not been approved, or in fact even reviewed by the ODF TC, especially when contradicted by the actually approved ODF 1.0 (in ISO and OASIS versions) and ODF 1.1?”

Alex:

“No, Rob – you are the one misrepresenting ODF.
The ODF 1.1 standard states: "[t]he normative XML Schema for OpenDocument Manifest files is embedded within this specification." (s17.7.1)
How is it anything other than non-conformant, for an XML document to be invalid to a normative schema?”

Rob:

“Alex, as you know, normative clauses include those that state recommendations as well as requirements. So ‘shall’ is normative, but so is ‘should’ and so is ‘may’. Normative refers to all provisions, not merely requirements. Please refer to ISO Directives Part 2, definitions 3.8 and 3.12. So normative does not imply "required for conformance". The conformance clause defines conformance and that clause clearly defines it in terms of the schema excluding chapter 17.”

Alex:

“I’m sorry, but you have veered off into the surreal now.
If you’re maintaining that "the normative XML Schema for OpenDocument Manifest files" is in fact NOT REALLY "the normative XML Schema for OpenDocument Manifest files", then we’re going to have to disagree.”

Rob:

“Alex, read more carefully. I never said that schema was not normative. What I said is that "normative" is not the same as "required for conformance", as you had been asserting. But now I think you may be confused on this as well.”

Joe Sixpack Developer Trying To Implement ODF:

WTF!!! This is completely crazy – it would be easier and quicker to send someone round to every user’s desk and type it out in the app of their choice instead”

This reveals a real practical problem, should I wish to implement ODF.  I would need to consult the Oracle of Westford (Rob Weir) about every detail of the entire implementation, in order to be on fully safe ground.

Now if Alex Brown and various other highly qualified commenters can misread the ODF specification so heinously, needing the expert advice from the ODF Technical Committee Chair himself to put them right, what chance do lowly developers have?

It appears from this there is no chance that any developer will be able to interpret the specification properly, since only Rob Weir (and perhaps the folks working on OpenOffice) has the intellectual capacity to navigate this ball of semantic string. 

Having to cross-reference this file format equivalent of the newest, most advanced physics paper on the planet every time you write a line of code is going to lead to madness.

This leads to a huge scalability problem, with developers worldwide having to get an opinion from Rob Weir on every implementation detail – although he could save a few hours a day by giving up the blog posts and comments, it would not be anywhere near enough to cope with the deluge of requests.

So, the solution to this is clear – we need to implement Rob Weir as a web service, a Weirotron if you like. That way, everyone can query the Weirotron and get back the definitive answer to any ODF question, without having to deal with the obviously labyrinthine spec that has bamboozled so many leading XML experts.

In addition, the Weirotron could help solve those pesky interoperability issues that stem from some areas that ODF relies on the OpenOffice source code for, like Formulas, and areas where the spec is a bit light, so to speak, like Change Tracking etc.  I’m sure that this would really assist Microsoft and many other struggling developers in implementing support for ODF (or rather the Weir-approved cod-ODF) correctly, with the inherent blessing of the ODF TC Chair.

Advertisements

Business Intelligence is a term that covers a multitude of sins.  It is also a term which is extremely open to interpretation, depending on your viewpoint, technology mastery, user skillset and information environment.

Creating new terms, especially acronyms is what the technology industry does best, they delight in it, but it does serve some purpose other than the amusement of marketing folks and analysts.

To go back to an old paradigm, creating labels or categories is an essential part of the market.  Not just the BI market, but any village market, or supermarket. 

Categories help consumers navigate quickly to the types of products they are interested in, like finding the right aisle to browse by looking up at the hanging signs in the supermarket, or the area in the village market where the fruit vendors gather.  Labels give more information, such as pricing, size etc and then it is down to the product packaging and the rest of the marketing the consumer has been exposed to in terms of advertising, brand awareness and so on.

Business intelligence is a pretty long aisle.  At one end, the labels are pretty narrow but at the other, very very wide, to accommodate the zeros after the currency symbol and ‘days to implement’ information.

The problem is the the long aisle – vendors need to break that aisle up into manageable (walkable) segments to help the customer navigate quickly to the solution they need.

The other problem is that in this case, the supermarket is not in charge of the category names, not even the vendors or analysts are – it’s a free for all.

This means chaos for the poor consumer, all capering around in the aisle like some kind of Brownian motion.

Thinking about this, after being bombarded with a panoply of BI terms lately, I thought of INCOTERMS, which is a standard set of terms used in international sales contracts.  These terms are strictly defined and published by an independent organization so that both parties in the transaction know exactly where they stand.

According to Boris Everson of Forrester Business Intelligence is “a set of processes and technologies that transform raw, meaningless data into useful and actionable information

Not sure about that one myself – who acquires and stores meaningless information? Other than maybe Twitter.  Other suggestions most welcome.  It might help show the possible technologies Forrester are referring to.

This certainly excludes my product, since we work with data that theoretically, people are probably already making decisions from. They just need to slice and dice it differently.

I can’t complain too much about Forrester though, at least they have Report Mining in their buzzword bonanza (courtesy of Curt Monash), our little niche.

The concept of transforming raw data is easier to work with (in the Forrester BI definition sense anyway) as it could refer to something like a web log, which is pretty difficult to gain any insight from by looking at it in a text editor, unless you have an eidetic memory and the ability to group and summarize the various keys and measures in your head.

Now, as often is the case when you start writing about a topic, the research you do unearths people who have written pretty much the same thing before you.

Going back to definitions, finding Todd Fox’s decent definition of BI – “A generic term to describe leveraging the organizations’ internal and external information assets for making better business decisions.” from a define:Business Intelligence search on Google, leads to Todd’s own attempts from a Data Warehousing perspective, which, in turn was prompted by James Taylor’s post on the confusion around the term analytics (in the context of BI). In addition, even Kimball was involved with his “Slowly Changing Vocabulary” section in one of his books.

This at least tells me I’m on the right track, if not entirely original.

In 1989 Gartner’s Howard Dresner defined BI as “a set of concepts and methods to improve business decision making by using fact-based support systems

More definitions can be found from Larry English and probably ad infinitum, or at the least, ad nauseam.

The depressing thing here is that we have only got as far as the “umbrella term” as BI becoming popularly known.

<Aside> A Dutch student at the University of Amsterdam even wrote a paper titled “Business Intelligence – The Umbrella Term” complete with an umbrella graphic on the cover page.  (It’s a doc file, so I won’t include the link.  Google it if you’re interested)</Aside>

When we start to address even Forrester’s BI buzzword hoard, never mind the others out there, it begins to lead to a total breakdown in the tried and tested categorization mechanism.

To revisit the source of the proliferation, it appears that analysts (likely as a proxy for large vendors) and vendors themselves are the main culprits.  The analysts, by virtue of some level of independence and a cross-vendor view can be seen to be the arbiters of the terms.  The problem here is that the analysts often use slightly different terms or at least different meanings for the same terms.

Naturally, both vendors and analysts want to proliferate and blur terms to aid in differentiation, or to try give the perception of innovation and progress. 

Although this is very seldom the case, as new terms are often just fracturing or rehashing existing categories and terms.

However, in some cases, drilling down into more narrow categories or updating terms due to changes in technologies or approaches is not necessarily a bad thing, if the terms/categories still aid in establishing a contract of understanding between vendor and consumer.

If we want to accommodate this, the ability to establish a common understanding, based on input from across the board – analysts and vendors, would be beneficial to all.  The problem is, you need a real independent organization that can accommodate the horse-trading, as well as maintaining an authoritative definition of terms which is acceptable to all parties.

Some amusing aspects of this I can foresee would be “Active Data Warehouse” – would you have to then create a new term “Passive Data Warehouse” to group the applications that did not fit the criteria of “Active”.  I imagine a semantic arms race that would have to be kept in check – IBMCognoStrategyFire pushes for a “Smart ETL” category, which forces the other ETL vendors into the “Dumb ETL” pigeonhole.  Dealing with this is what standards bodies do.

This is more musing than actually being stupid enough to think this is ever going to happen.  I do have sympathy with the poor customer trying to navigate the shelves of the BI supermarket though.  As someone just trying to keep a lazy eye on the machinations of the industry, it can be overwhelming.

Here’s a short quiz.

What BI term does this refer to?

“centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.”

Was this your first guess? This?

No.  Much earlier.

The more things change, the more they stay the same.

Maybe we could just provide a thesaurus, so when someone is puzzling over the latest buzzword, they can look it up and say ahh, I know what that is, we tried to implement something like that back in the early nineties.

 

UPDATE: Read this excellent article from Colin White – I didn’t see it before I wrote this – I promise!

The catalyst for Colin’s article (Stephen Few) can be found here and follow-up by Ted Cuzillo here.

It seems like the rancour around the Open XML / ODF soap opera still simmers away.

Even though the volcano has erupted and the ash has cooled, it won’t be forgotten.

For those who don’t know the tale and want a recap:

Sun buy the commercial product StarOffice, then give it away for free. When that still doesn’t arouse too much interest, they decide to open source it instead in a move to reduce MSOffice dominance.  It is called OpenOffice.org, or OpenOffice ->

IBM and Sun dream up strategy of beating MS Office with government mandate stick via standardisation route.  Open Source folk find it an intoxicating bandwagon too->

OpenOffice file format is morphed into a standard and hurried through standardisation->

IBM put coin and feet on the street into strategy and start to make progress with some government bodies.  An IBM services bonanza beckons.  Sun wonders (as usual) how are THEY going to make money->

Microsoft, either reacting to this strategy, or in parallel, being forced by European Union, also morph their Office suite into a standard and hurry it through standardisation->

(note – slightly more fuss/interest around this than the ODF process)

IBM/Sun/Open Source folks call foul and IBM throws its toys out of the pram regarding ISO’s value.

During this debate, there are numerous mentions of corruption.  Some of it is well documented and looks like the usual lobbying that one might expect.  IBM + friends went at it hammer and tongs, so did Microsoft.

However, these nebulous claims still remain. I am reminded of Tim Bray every day I look out of my window, since the UK office of OpenText is visible from my home, a company which Tim co-founded.

One would have thought even a plain speaker like Tim would be able to voice his concerns, which should naturally have been weighed up in the context of the ‘theatre’ (in the war sense, and others) and must therefore be so heinous that he fears legal action (from Microsoft, one assumes) and has decided that discretion is the better part of valour.

I have an idea.  Anyone with any of these obviously very juicy stories can put them up on WikiLeaks.  I suppose you could always use Groklaw, but I can’t figure out if it’s one of those ‘ironic’ parody-style sites or not. It smacks of those religious site parodies, which are pretty much indistinguishable from the real thing.

I took a quick look on WikiLeaks and the only things I could find were all from Stefan Krempl, just links to published articles on heise.de.  Not very juicy.

So if anyone has any top notch dirt, I encourage them to use WikiLeaks.  I would welcome some new material to read, as the flywheel still seems to keep spinning on the same old energy.  Maybe we’ve found the perpetual motion machine.

Come on people, it’s catharsis time.  You’ll feel better. 

In a more positive vein, it looks like everyone is going to play nicely in the new ISO maintenance structures. I only said it looks like.

In the spirit of full disclosure, the company I work for are Microsoft partners (as we are IBM partners) and have implemented Open XML spreadsheet reading and writing in our products.  We are Excel-centric, so most of our customers have never heard of ODF.  Still, maybe once they get hold of Office 2007 SP2 and the ODF support, they’ll start clamouring for it.