I possess an accent which seems to fox most voice recognition systems.  I suppose the closest that it comes to that most non-British people will have heard is that of the Arctic Monkeys, since Sheffield is only about 15 miles away from where I grew up.  Generally voice recognition systems are optimized for American voices, so I am at a disadvantage right off the bat.  That said, my accent, diction, intonation and whatever other elements combine to entice such systems into the creation of ridiculous transcriptions, far in excess of my other countrymen.

The main source of these transcriptions is the Microsoft voicemail system, powered by Exchange / Office Communicator.  I often need to contact people at Microsoft, whether in terms of product team members in relation to Monarch and Monarch Data Pump and it’s interaction with Office, or on the interoperability side, in my work within ISO/IEC/JTC1 SC34 Working Group 4.

A couple of years back, I was sent the transcription of one of my voicemails by a contact at Microsoft, as he found it amusing.  This has continued as a tradition with more contacts there, over the years.

Unfortunately, I don’t have the actual content of these voicemails, but rest assured, it bears no resemblance to the voicemail transcription system’s version of events.

It seems to have settled on a food related theme here, for no apparent reason.

“Hey ****** it’s Gareth just love let you know that some track who cherry. Job begin from the I’m consoles console kidney stones Julie Weiss seafood see walnuts the community they’ll be. Doing any case full you try and hook up with my sequel on. You non Jerry fish and and should not very seems like he’s out the offices about it so I will give you a heads up. That. Hey we call it a. He mentioned I tried that section. Well please correct. Anyway. I’ll speak to you soon bye bye.”

“I got your inspection yourself I think this is so I can. Thank you — only so another chat about the recent my seafood see what we — and — hey Ryan Montoya from the columns like something on your cell phone — for the Y7 check your good weekend. Speak to you soon bye bye.”

What’s with the seafood? Who is “Ryan Montoya from the columns”? Is that like ‘the projects’ ?

It also seems to think that I call under assumed names every time, which I most certainly do not.

“Hey ******* it’s John Anderson calling just ringing to let you know. Alright I’ll speak to you later bye bye.”

“Hey ******* it’s Scott Smith. Give me a call I’m at 8 to 9 tonight chat — I have sounds and I will talk to you later — but I just wanted to check if there’s any YAM — issue than anything on called full sign now maybe I’m sure — I’ve talked for — at talk access the full Doc acts so. Anyway I’ll try and get you like 10 or 12 logic tomorrow on then bye bye.”

“Hi ******* it’s Doug I just thought give you a call on the off John this is a I’d be very soon seems to me a call I can phone bye bye. Hi me like cellular the busy day had and interest in CD it’s Davidson — the funny going over all I — need to be chief. A lot and I’ll speak to you soon bye bye.”

“Hey ******* it’s Just. Not — hi this is all give me a call 90 because I’m stuck in traffic minutes 10:15 PM and a bomb any other big blocks and right on 30 and I think I’m probably gonna I’m on the status bar now — anyway — just so I’m a little chat but — now that I need sing Allen so I’ll speak to you like and you can always full with me the — note that absolutely fascinating transcription but — zero.”

Er, no bomb.  Hope the NSA wasn’t listening. Spooky that it noticed when I criticized it’s “absolutely fascinating transcription”.

“Hey ******** you got this so I’d better give usable last with the voice mail transcription.– Interesting regarding the sudden all the Columbia look into this that would be blocking I called up laws — with message something to a — haven’t replace — I’d come see you go I saw — a lot — anyway that was about to go so I just now — it just an email from strolled these seems a lot of problems he without things about that. Anyway I’ll speak to you soon bye bye.”

Again, it comes out of it’s fug when being talked about.  Not sure how Columbia got in there.  The NSA almost certainly has a file on me by now, if their voice recognition software is anything like this.

“Hey ******* sister it’s joint get in touch with them. Actually see the from I got hold of an experienced a Angeline Love — anyway we’ll have a chat that’s it and reinstall anyway — good evening not in school. Hundred bye bye.”

I would like to clarify in no uncertain terms that I have never got hold of an experienced Angeline Love. Nor do I refer to senior Microsoft staff as “sister”.  Especially when they are male.

“Hey ******** it’s got the phone because message off.  — So I — will probably try again — tomorrow — shopping — yeah I’m not sure that you — are shipping didn on home trapeze — or not many more experienced you recall I appreciate troubles. So anyway that’s about it.”

I’m pretty sure you can’t get a home trapeze, even on those American shopping channels.  Maybe I can get Microsoft to pay me to talk into the system all day long and see what other interesting business ideas come out of it.

“My name is Gareth. Telling the boys and shopper internal costing unsure I guess we’re playing right next one — that you sent debriefing from ****** — extension two three oh should be self so I’m I’m trying to get on your cell moment rolling I’m — bye bye. Hello.”

Got the name right, but I’m pretty sure my contact knew my name after a couple of years. Plus, I don’t usually end voicemails with “hello”.

“Thanks ******** a Scott but this is all give me a call at about 40 years.So I’m just getting a chance — together and tax money like you probably not tonight I’m I got your message — you know your voice mail sit back inside along the venturing stuff — I’ll try you like to delete chop chop.  Parking lot bye bye.”

I distinctly recall never asking anyone to give me a call in 40 years.

“Hi ******** hopefully as laughter events it’s and IK series new package explorer which is a pretty know likes.– Just a wishing you could look for next week probably going out to a demo to be very silly so — could look in the lines done and I don’t have a good weekend anyway. I’ll speak to you soon bye bye.”

At last! “new package explorer” is actually accurate and refers to Wouter Van Vugt’s Package Explorer utility. My days of “going out to a demo to be very silly” are but a fond and distant memory.

“Hey ******** it’s Gareth. Chad discuss about — what full day — digital cool of calling technically awful stuff — dot com and run some — probably — some of them thanks and so. Anyway I’ll speak to you bye bye bye.”

This is actually a recent one, so I am 100% sure that I didn’t say this.  One wonders whether it has been added to some kind of Exchange voice recognition dictionary as a high frequency phrase within Microsoft. 

By the way, technicallyawfulstuff.com is still available.

 

As a member of ISO/IEC JTC1 SC34 Working Group 4 (which Mr Norbert Bollow of the Swiss mirror committee somewhat bizarrely refers to as “so-called”) and someone directly implicated in his recent blog post,  I thought it might be useful to help him understand the situation more clearly.

I have been heavily involved in spreadsheets over the last 14 years working at Datawatch.  For the last 9 years, I have been in charge of the Monarch and Monarch Data Pump products, which have interacted heavily with spreadsheets, both from an input and output perspective.  We supported (and still support older versions – the file format specifications are not available for later versions) Lotus 1-2-3 as well as Excel, stretching back to Excel 2.1 and Lotus versions well before that.

Datawatch is both a Microsoft and IBM Partner.

The Monarch product is primarily used in conjunction with Excel, with approximately 95% of our users reading, writing, appending and updating Excel spreadsheets, both the older binary formats and the new OOXML format.

We have a large user base of around 500,000 users worldwide and have gained fairly detailed knowledge of how people use and abuse spreadsheets in various ways, as well as how many other vendors import and export spreadsheets from their applications in the 18 years since the first release of Monarch.

Sowing the seeds – versioning

Consequent to the lack of a versioning scheme in ECMA376-1, applications created to consume and create OOXML documents were unable to distinguish between ECMA376-1 and future versions.  This should have been addressed in the original specification and certainly at the BRM.  It was not, which casts doubt on the unimpeachable sagacity which some seem to attribute to decisions made at the BRM.  This sacred cow status, especially surrounding ISO8601 dates is not a healthy thing-it should be subject to scrutiny and review, especially with far more time available to analyze the ramifications of changes than was available at the BRM.

Sowing the seeds – anyone for spreadsheets?

Another aspect of the BRM and indeed much of the process around OOXML is the lack of spreadsheet experts involved.  Practically all those involved in the process are really XML and document specialists.  Their background, depending on age, is almost always SGML and XML, not VisiCalc, Lotus 1-2-3, Quattro Pro, Excel, Gnumeric and Calc.

Spreadsheets then, became second class citizens in the process, with few people showing them the care and attention showered upon the word processing aspects of the specification. 

The XML experts came from the viewpoint of XML Schema, which, many may be surprised to learn, does not itself fully implement ISO8601 dates.  It wisely uses a tightly defined subset of ISO8601.  Many advocated that ISO8601 dates should be used within OOXML documents.  This approach is eminently sensible, since it is much simpler to consume documents with XML technology, if the date data can be easily consumed and processed by common XML tools.

However, spreadsheets have paid very little attention to XML, the file formats have historically been extremely terse and efficient and they have their own design goals which are distinctly different from word processors.  One thing that spreadsheets always do, is store and process dates as serial date values.  Almost every single spreadsheet file in existence contains serial date values.

The Leap Year Bug – or not

Mr Bollow refers to the the reintroduction of the leap year bug which was introduced by Lotus 1-2-3 and replicated in Microsoft Excel.  The very fact I say “introduced by Lotus 1-2-3” gives you an idea how venerable this bug actually is.

This bug has existed for an exceedingly long time and anyone that deals with spreadsheets is well aware of it.  In fact, it has really ceased to be a bug and become expected behaviour.  I can’t claim this is a good thing, but that is the way that it is. 

Now, when you start to deviate from expected behaviour that has existed for decades, you will run into problems.  The amount of spreadsheet consuming and producing applications is gigantic and changing the ground rules is not an option if you want any sort of interoperability.  The leap year bug itself is not an intrinsic issue with serial dates itself, but an application issue introduced by Lotus 1-2-3 which has become accepted practice. 

If Mr Bollow is saying that ISO8601 dates must be used everywhere without exception, then I am assuming he also advocates throwing Unix time over the side in favour of an ISO8601 implementation.  Good luck with that. Oh and OpenFormula too. (More later).

Well, maybe semi-reintroduced

In addition, Mr Bollow fails to make the distinction between the two forms of OOXML: Transitional, which is meant to help support “legacy” information and provide a transition vehicle to the more pure Strict form of the standard. 

Serial dates for spreadsheet cell values were not allowed in the Strict form and have not been reintroduced in the Strict form by Working Group 4.  As far as I am aware, there is absolutely no intention to do so.  In addition, many thought that serial dates were allowed in the Transitional form, and it came as some shock to many when I pointed this out originally, back in February.

Another important point is that this only affects spreadsheet cell values, ISO8601 functionality is not being excised by these changes, in fact ISO8601 dates are still allowed in spreadsheet cell values in the Transitional form.  Personally, I don’t think this is wise, to fully avoid data loss issues, ISO8601 dates in spreadsheet cell values should not be allowed in Transitional, but be the only allowed date values in Strict.

There are many places where the ISO8601 date specification is used and will continue to be used in spreadsheets in the Transitional form, such as the WEEKNUM function, which has arguments to specify ISO 8601 week numbering.

Can’t we just leave it as it is and let users/vendors sort out the mess?

I have heard this argument, and it immediately marks out anyone who makes it as a complete dillettante with respect to the workings of finance.

In contrast to word processing documents, dates are far more pervasive within spreadsheets and in general, far more critical.  Most financial analysis is date-based, reporting is always date based and calculations of financial instruments are mostly date based.  It is safe to say that an extremely high proportion of spreadsheets contain dates and that the integrity of those dates is critical.

In ECMA376-1 all dates in spreadsheets were treated as serial dates, so any reading and writing of dates was using this format. With the enforcement of ISO 8601 Dates in the current specification (§18.17.4, §18.17.4.1, §18.17.6.7) and the primary example (§18.3.1.96) featuring the use of ISO 8601 Dates and the newly introduced d attribute (from §18.18.11 ST_CellType), all conforming applications must write dates in SpreadsheetML cells in ISO 8601 format.

This semantic change (no schema-enforced change exists) means that all existing applications fail to open IS2900 transitional spreadsheet documents correctly. The observed behaviour of applications differs from an inability to open the document to silent data loss.

This is a huge problem, if you combine the inability to distinguish two versions of the specification of instance documents with a semantic change, you have a recipe for disaster.  To simplify, imagine the chaos that would ensue if you silently changed the currency you use for accounting, but didn’t tell any of your finance staff when.

The silent data loss encountered is made even more problematic, in that the value may be parsed as an incorrect date, instead of a null value or other failure. This means that there is little to alert the user of problems. Dependent formulas will not fail with divide by zero errors, only when there is additional logic that takes into account date boundaries. Visual recognition of the failure will usually be required.

Some other scenarios to consider are when using spreadsheet files that are linked to other spreadsheets, or products that perform lookups to spreadsheets. This means that patching of older applications would have to be absolute and in sync across all organizations involved. For example, a spreadsheet in one country could be used as a lookup for data by spreadsheets in another country. This would mean that all branches, divisions and subsidiaries of an organization may need to ensure that all application software that consumed spreadsheets is patched in sync to avoid data loss.

The applications I tested are easily available ones, but we also need to consider larger applications such as ERP and BI software where bi-directional use of spreadsheets is used. This means that bad data in the local spreadsheet could be propagated to an enterprise-wide system. Patching such enterprise-wide systems is an extremely costly undertaking. Even patching Office suites is a very costly undertaking, in terms of testing and rollout, even if there is no additional software cost from the application vendor.

But how many ECMA376-1 consuming applications can there be?

Another argument is that this is a storm in a teacup, the primacy and purity of ISO8601 dates is more important than the pandering to the handful of applications out there that might encounter a later version and produce errors.

There are many of them out there, obviously Office 2007, which is fairly popular, I’ve heard, but also a huge amount by smaller vendors, such as SAP, Oracle, Lawson, IBM (e.g. support for Excel 2007 in Cognos)– people like that.  As mentioned before, most ERP (Enterprise Resource Planning) and BI (Business Intelligence) vendors deal with XLSX files, some of them bi-directionally – meaning bad data could be propagated globally throughout these enterprise-level systems.

But these are just the tip of the iceberg, there are many bespoke in-house systems, especially in the financial space that rely heavily on Excel files.

But, won’t vendors have been slow to adopt the new formats, I hear you cry.  The answer in this case, is no.  The benefit of the OOXML format is that it has been much easier and quicker for vendors to implement support than the old binary Excel formats, which were horrible.  The documentation was obviously much better and the XML nature meant it was far easier to implement support on different platforms, which is key for enterprise vendors that run on a huge gamut of different OS and technology platforms.

In addition, there has been a lot of pressure from users to support the new formats, as the size of spreadsheets was greatly expanded with the introduction of OOXML.  Previously, Excel spreadsheets were limited to 65,536 rows.  Enough for any spreadsheet, you may say, but in my experience, they always wanted more.

We frequently had product enhancement requests to allow Monarch to export many hundreds of thousands of rows into spreadsheets, using tricks such as populating one sheet, moving on to the next when the limit was reached and so on.

I wonder about the wisdom of million row spreadsheets, but users will always seek to push the envelope.

Behaviour of existing applications when encountering ISO 8601 Dates

Some testing (earlier this year) was performed on easily available applications, to see what the scenario of using ISO8601 dates in an instance document would look like.  The following implementations fail when opening a file which contains no changes introduced in IS29500, except for ISO 8601 dates, with the t attribute of the cell set to “d”. Only Datawatch Monarch 10 will work without error, under the (unlikely) condition that the ISO-8601 date string only includes the date portion. All other tested implementations fail.

Office 2007 SP1

Warning dialog appears “Excel found unreadable content in <>. Do you want to recover the contents of this workbook …” On clicking Yes, the file is loaded but all dates are removed.

Office 2007 SP2 Beta

No warning dialog appears, dates are silently corrupted, but still exist within the file as valid, but incorrect dates.

OpenOffice 3.0.1 Calc

Similar behaviour to Office 2007 SP2 Beta

NeoOffice Mac

Similar behaviour to Office 2007 SP2 Beta

Apple iWorks 09 Numbers

Similar behaviour to Office 2007 SP2 Beta

Apple iPhone

Similar behaviour to Office 2007 SP2 Beta

Excel Mobile 6.1

Similar behaviour to Office 2007 SP2 Beta

Datawatch Monarch V9 / Monarch Data Pump V9

File cannot be opened

Datawatch Monarch V10 / Monarch Data Pump V10

File can be opened correctly if only the date portion of an ISO-8601 date string exists. If it is the long form, an error message warning of corrupt data appears, informing the user that it will be imported as nulls. The problem can be rectified, by changing the field type from date to character. Note that Monarch is often used in lights-out operation and Data Pump is only used in lights out operation.

Cleaning up the mess

So, the “so-called” Working Group 4 were faced with the following set of problems:

  1. For an existing ECMA 376-1 consuming application, there was no way to distinguish a later version, so the application would happily read and process any future instance, unaware of any changes (especially purely semantic ones!)
  2. For an existing ECMA 376-1 consuming application, there was no way to distinguish between a document of conformance class strict versus one of conformance class transitional.
  3. Changes to implement ISO 8601 dates in SpreadsheetML had not been thought out well at all in the BRM process.
  4. Changes to implement ISO 8601 dates per se had not been thought out well at all in the BRM process (i.e. no subsetting as per the XML Schema spec)
  5. Many assumed that serial dates were still allowed in the transitional form, which one could easily assume based on the lack of strong typing (the cell value, which is the target container for dates) is a string, not a date, with an optional attribute to indicate an ISO8601 date.  In addition, there is a large amount of text in the OOXML specification referring to serial dates. 
  6. The catastrophic silent data loss issue proven to exist in many applications designed for ECMA376-1.

We all know the various stories about the financial catastrophes that can occur with errors in spreadsheets.  Compounding this enormously at the file format level would not be popular amongst organisations such as EUSPRIG or indeed, anyone using spreadsheets at all, which is just about everyone.

So what did Working Group 4 decide to do about this?

Let’s take a look at the Scope statement for IS29500:

"ISO/IEC 29500 defines a set of XML vocabularies for representing word-processing documents, spreadsheets and presentations. On the one hand, the goal of ISO/IEC 29500 is to be capable of faithfully representing the preexisting corpus of word-processing documents, spreadsheets and presentations that had been produced by the Microsoft Office applications (from Microsoft Office 97 to Microsoft Office 2008, inclusive) at the date of the creation of ISO/IEC 29500. It also specifies requirements for Office Open XML consumers and producers. On the other hand, the goal is to facilitate extensibility and interoperability by enabling implementations by multiple vendors and on multiple platforms."

(For anyone that was wondering, Office 2008 was the Mac version.)

Although the preexisting corpus only references Microsoft Office, it certainly applies to the huge corpus of documents produced by applications other than Office, but consumable by Office too.

  1. Since the Transitional form is meant to help deal with the transition of legacy documents, it was decided to make best efforts to provide compatibility with ECMA376-1 in the Transitional form of OOXML, so that existing applications worked properly.  This involved clarifying or reintroducing, depending on your point of view, the use of serial dates for SpreadsheetML cell values.
  2. Since the Strict form is the ideal form of the specification (ISO8601 dates only etc), where applications should strive to end up over time, it was decided to change the namespace, so that applications designed for ECMA376-1 would not be able to read them, avoiding data loss issues.  In the absence of an existing versioning system, this was the only way to prevent existing applications from processing future version files that would likely create compatibility issues.

In addition, some members of Working Group 4 are determined to consider the implementation details of ISO8601 dates in spreadsheets, and wider, possibly using a subsetting approach like that found in XML Schema. 

There certainly needs to be definition of which forms of ISO 8601 elements should be used, for example, possibly specifying “Complete Representation in Extended Format” should be used for dates and times, with separators explicitly defined and so on.  Other considerations might be the expansion of the range of valid dates, less than zero and greater than 9999.  ISO 8601 allows for a fair degree of ambiguity, so honing down the allowable forms would make implementers’ lives much easier.

There are also a wealth of other aspects of ISO8601 that would need to be excluded, such as recurring time intervals.

In the final analysis, the venerable Leap Year bug, now, somewhat strangely, elevated to accepted behaviour, is far less dangerous than the silent data loss problem that not allowing serial dates in spreadsheet cells could be.

Further reading

Rob Weir (CoChair – OASIS ODF TC, Member – OASIS ODF Adoption TC, Member -OASIS ODF Interoperability and Conformance TC, Member – INCITS V1, Chief ODF Architect, IBM)

A leap back

This is an interesting post, but there are a few issues that I need to address here:

“If you guessed “Microsoft”, you may advance to the head of the class.”

Alas Rob, it was Lotus that thrust this onto the world when they were the dominant spreadsheet and the minnow Excel had to play ball!

“The “legacy reasons” argument is entirely bogus. Microsoft could have easily have defined the XML format to require correct dates and managed the compatibility issues when loading/saving files in Excel. A file format is not required to be identical to an application’s internal representation.”

That may well be true, but I would imagine that would cause a large technical burden when managing backward compatibility with fixes such as the Compatibility Pack, as well as for the tens of thousands of developers reading and writing BIFF8 (the older Excel native binary format) who likely consumed, processed and exported serial dates. 

Spreadsheets historically did not have date engines that could deal natively with ISO8601 dates and I doubt any do now.  They could, of course parse them in and out, but it is not a trivial amount of work to put in the plumbing and why take the performance hit.  Serial dates are great for date diffing and grouping, which is one of the most common operations – i.e how old is this debt, what transactions are in this quarter etc.

In addition, this argument cuts both ways, applications could convert serial dates into ISO 8601 dates if they so wished.  Anyway, as of today, we have to clean up the mess as best we can.

Allowing serial dates in OOXML also makes it easier to interoperate with the forthcoming OpenFormula specification, which reasonably eschews ISO8601 dates in favour of serial dates and datetimes as input.  BTW OpenFormula looks excellent and I must commend the work of Dave Wheeler and the rest of the OpenFormula SC.

As per the latest OpenFormula draft of May 9, 2009:

“A Date is a subtype of number; the number is the number of days from a particular date called the epoch. Thus, a date when presented as a general-purpose number is also called a serial number.” …

“A DateTime is also a subtype of number, and for purposes of formulas it is simply the date plus the time of day.”

I do hope Mr Bollow is pursuing the OpenFormula SC with the same vigour for their anti-ISO8601 activities, maybe we can convince him together!

Joel Spolsky (former Excel Program Manager) 

My first BillG review 

This explains the infamous leap year issue that Lotus created and Excel had to stomach.

The problem is that the horse has bolted, we now have to figure out to do the best with what we have.

Jesper Lund Stocholm (SC34/WG4 Member, Danish Standards)

Versioning

Discussion of ISO 8601 dates in Spreadsheets

Alex Brown (SC34/WG1 Convenor, SC34/WG4 Member, British Standards)

ISO 8601 date discussion at Copenhagen WG4 meeting.

Apologies to my Business Intelligence readers, this post is pretty esoteric and refers to some strange goings on in the document standards world.

Apologies also to Rick Jelliffe, any resemblance to Schematron is entirely unintended.

After the drama of the initial Office 2007 SP2 ODF support was initiated by Rob Weir, related arguments spilled out all over the place.  I got involved in a few places, but then left it alone whilst I got on with some productive work.  There’s only so many ways you can say “it’s does not exist in the standard – admit it, fix it and move on”.

Needing something to accompany a coffee yesterday, I found that the argument was still raging on Alex Brown’s blog, so I settled down for some light entertainment.

Once again, Rob Weir was arguing with Alex about some interpretation of the ODF specification and calling Alex names.  Then ‘Marbux’  weighed in with possibly the two longest comments in internet history.

This highly esoteric, intellectual argument between Dr. Alex Brown (the Convenor of ISO/IEC JTC1 SC34 Working Group 1), Mr. Robert Weir (the Chair of the ODF Technical Committee at OASIS) and Mr. Paul E. Merrell J.D. AKA ‘Marbux’ (a Juris Doctor) essentially (and I simplify enormously) about whether something specified in the schema as required for a document instance to be valid/conformant, was not actually required, because Rob said it wasn’t and Alex and Marbux said it was.  Rob had his reasons for this, but they were completely lost on me.  I mostly understood Alex’s reasoning, but all this came at the cost of a Red Bull chaser after the coffee.

Note: I was always under the impression that it was good practice for something that was invalid against the schema to be deemed non-conformant.  Rob shows just what a absolute noob I am to assume that.

Here’s the selected highlights (I know it looks unexpurgated, but take my word for it)

Rob:

“Alex, where do you read that validity of manifest.xml is required for conformance? That is news to me. Section 1.5 defines document conformance and says that conformance requires validity with respect to the OpenDocument schema. Section 1.4 defines that schema to be that described in chapters 1-16. This is confirmed in Appendix A. The optional packing format, including manifest.xml, is described in chapter 17, which the astute reader will observe is not between 1 and 16. Therefor it is not part of the OpenDocument schema to which validity is required for conformance.”

Alex:

“Err, you’re reading the wrong version of the spec. The manifest invalidity problem only occurs for the two apps (OOo/Sun plugin) which claim to be using so-called "ODF 1.2"; so for the results here, the conformance section 1.5 of ODF 1.1, that you quote, is not relevant.”

Rob:

“You misread the standard. But rather than admit and accept that, you are now trying to redeem your post by a back-up argument, which tries to apply your interpretation and prediction of what Part 3 of ODF 1.2 will say when it is eventually approved. But this text has not even reached Committee Draft stage in OASIS. It is merely an Editor’s Draft at this point. Are you really going to persist in making bold claims like this based on an Editor’s Draft, which has not been approved, or in fact even reviewed by the ODF TC, especially when contradicted by the actually approved ODF 1.0 (in ISO and OASIS versions) and ODF 1.1?”

Alex:

“No, Rob – you are the one misrepresenting ODF.
The ODF 1.1 standard states: "[t]he normative XML Schema for OpenDocument Manifest files is embedded within this specification." (s17.7.1)
How is it anything other than non-conformant, for an XML document to be invalid to a normative schema?”

Rob:

“Alex, as you know, normative clauses include those that state recommendations as well as requirements. So ‘shall’ is normative, but so is ‘should’ and so is ‘may’. Normative refers to all provisions, not merely requirements. Please refer to ISO Directives Part 2, definitions 3.8 and 3.12. So normative does not imply "required for conformance". The conformance clause defines conformance and that clause clearly defines it in terms of the schema excluding chapter 17.”

Alex:

“I’m sorry, but you have veered off into the surreal now.
If you’re maintaining that "the normative XML Schema for OpenDocument Manifest files" is in fact NOT REALLY "the normative XML Schema for OpenDocument Manifest files", then we’re going to have to disagree.”

Rob:

“Alex, read more carefully. I never said that schema was not normative. What I said is that "normative" is not the same as "required for conformance", as you had been asserting. But now I think you may be confused on this as well.”

Joe Sixpack Developer Trying To Implement ODF:

WTF!!! This is completely crazy – it would be easier and quicker to send someone round to every user’s desk and type it out in the app of their choice instead”

This reveals a real practical problem, should I wish to implement ODF.  I would need to consult the Oracle of Westford (Rob Weir) about every detail of the entire implementation, in order to be on fully safe ground.

Now if Alex Brown and various other highly qualified commenters can misread the ODF specification so heinously, needing the expert advice from the ODF Technical Committee Chair himself to put them right, what chance do lowly developers have?

It appears from this there is no chance that any developer will be able to interpret the specification properly, since only Rob Weir (and perhaps the folks working on OpenOffice) has the intellectual capacity to navigate this ball of semantic string. 

Having to cross-reference this file format equivalent of the newest, most advanced physics paper on the planet every time you write a line of code is going to lead to madness.

This leads to a huge scalability problem, with developers worldwide having to get an opinion from Rob Weir on every implementation detail – although he could save a few hours a day by giving up the blog posts and comments, it would not be anywhere near enough to cope with the deluge of requests.

So, the solution to this is clear – we need to implement Rob Weir as a web service, a Weirotron if you like. That way, everyone can query the Weirotron and get back the definitive answer to any ODF question, without having to deal with the obviously labyrinthine spec that has bamboozled so many leading XML experts.

In addition, the Weirotron could help solve those pesky interoperability issues that stem from some areas that ODF relies on the OpenOffice source code for, like Formulas, and areas where the spec is a bit light, so to speak, like Change Tracking etc.  I’m sure that this would really assist Microsoft and many other struggling developers in implementing support for ODF (or rather the Weir-approved cod-ODF) correctly, with the inherent blessing of the ODF TC Chair.

Update: The articles are once again behind the IBM paywall.  Going to assume that is the policy from now on until something official from IBM says otherwise.  Lame.

I hate the way that when newspapers have to publish an apology, they cram them in to a tiny space somewhere on the latter pages.  So, a new blog post, rather than a footnote to the old one is deserved.  Although not really an apology, the spirit of redress is the same

Most (if not all!) the credit should go to @SethGrimes, who was the first to blog about this, unbeknownst to me and who also approached IBM directly.

It’s an excellent decision and made with some alacrity, given IBM’s size and doubtless spools of internal red tape.

So now, those interested in the Godfather of Business Intelligence (Hans Peter Luhn) and the (often unsung) pioneers of Data Warehousing (Barry Devlin & Paul Murphy) can read their seminal articles without let or hindrance from IBM.

Yesterday Jos Van Dongen (@JosVanDongen) discovered that H.P. Luhn’s seminal paper on Business Intelligence, dating from 1958 was no longer accessible from IBM.  (Edit: See also the ur-post from Seth Grimes on this – not the first time I’ve been a johnny-come-lately to a topic!) Not only that, but any attempt to read Barry Devlin’s work on Data Warehousing was also thwarted by the ominous-sounding “IBM Journal of R & D | IP Filtering Page”.

“The IBM Journals are now only available online for a fee.” Barked the page.

Instead of the prescient words of Hans Peter, one is now greeted by the announcement that you now have to pay to read the words of wisdom of the godfather father of BI, who happened to work at IBM.

I know these are tough economic times, and IBM need to extract every last cent from their assets too, but the benefits of associating IBM with BI giants such as Luhn and Devlin far outweighs the meagre revenue they will gain from those who are forced to subscribe just for the few BI-related IBM articles.

They should be shouting about their BI bona fides, not locking them up in a subscription to a journal that most people have never even heard of and are unlikely to spring $1000 for.

Maybe the best way is to have some kind of ‘Heritage’ collection, featuring the superstars of the IBM back catalogue which are made available for free.  These might even be promoted to improve IBM’s image as an innovator and not a staid old behemoth, associated with mainframe monopoly and expensive services engagements.

The other issue is the multitude of links out there from a wide variety of people including analysts, business intelligence practitioners, academics, students, even Wikipedia.  The Wikipedia definition of the term ‘Business Intelligence’ even includes a link to the paper. Over time, these links will either get removed, leaving Luhn’s work unread, just a name in a history of BI, or just serve to annoy those who come across them whilst researching and reading about BI, wondering why IBM is nickel and diming them.

Just in the small Twitter business intelligence community, there are quite a few people who have linked to Luhn’s paper:

@SethGrimes

@CurtMonash

@Claudia_Imhoff

@TimoElliott

Even an IDC report hosted by IBM and a history of BI on the Cognos site  by the French Museum of Informatics.

There are thousands more links back to this paper, after all, he is the godfather father of Business Intelligence, not just any old IBM researcher.

Edited April 28, 2009 5:44:29 pm GMT – preferred Mark’s suggestion of ‘godfather’.

Edited May 11, 2009 5:06:30 GMT – Link to Seth Grimes’ earlier post on this topic.

With thanks to Barney Finucane (Twitter:bfinucane) for the inspiration.

A tweet from Barney led me to daydream about a fantasy scenario with Larry Ellison and Richard Stallman on Ellison’s yacht.

Larry summons Stallman, after being informed that RMS is the guru of Open Source software, a boatload of which Larry has just acquired.

“So Dick, they tell me you are the go-to guy for this Open Source stuff.  I’ve got the goddamn stuff coming out of my ears now.  I need a 20% growth rate next fiscal on licenses, lay out your business plan for me.”

“Larry, I don’t think you have quite grasped the concept of Open Source, it is about freedom and letting everyone have choice”

“Yeah Dick, that’s good spin, go on, where’s the main growth potential”

“Larry, the point of Open Source is that it is free, you don’t pay for it”

“Dick, you’re not making any sense, or do you mean that you don’t pay for it up front, kind of a loss leader sort of thing, then we come in with the maintenance and support hammer and grind the customer to a pulp.  A sort of market share grab, is that the deal?”

“No Larry, you develop software and give it away, including the source code”

“Dick, now you’re really starting to worry me, you look like a bit of a hippy-have you dropped acid or something?”

“Larry, you should read some of my writing, about how software has no owners, it is free, the documentation is free, it is more reliable than proprietory softw…”

Gunshot rings out.

“Kenny, (gestures with smoking pistol to robust-looking bodyguard) throw this crazy-ass hippy in the ocean will you”

Anything-As-A-Service Paranoia

February 11, 2009

There is a lot of talk about how <insert letter(s) here>AAS, especially in BI, is going to dominate 2009, mainly due to low startup costs and the hope of expert analysis of any organization’s complex business model by outsiders.

This is all well and good, but as a cynic and a slightly paranoid one at that, I can see certain risks that others with a more sunny disposition may not entertain.

I’m not alone though and in good company at that.  For example Larry Ellison (“It’s insane”), Richard Stallman (“It’s worse than stupidity”), Bill McDermott (“It just won’t work). Admittedly they have their own agendas, but they give good quote.

5 nines?

The top tier providers do have a pretty good record here, but there is still the odd outage or two, even for Google Apps and Salesforce.  I know that it is fairly rare for internal IT to be more reliable, but you can be more granular.  For example, if you have a critical campaign or similar event, then you can step up the level of investment in disaster recovery with more hardware/software/staffing etc for the critical event and then ramp down again.  In addition, some of these stats don’t take into account an internal IT’s PLANNED downtime, which when done correctly should have very minimal impact on the business.  With SaaS, you’re in the pool with everyone else, no special treatment, no DEFCON 1 SLA on demand.  Same as disaster recovery – no 80/20 option of just getting something up and running or a small amount of data to be going on with while the whole thing is fixed, it’s all or nothing.

And what happens if you do suffer problems with business continuity? In most cases you can get your money back for a specific period (or a portion of it).  Some of the stories I have heard regarding downtime have ended up with much larger business impact costs than a month of SaaS payments, that’s for sure.

Who can you trust?

I started drafting this post even before the Satyam business (Yes, I know that’s a long time ago, but I’ve been busy).  The answer is you can’t really trust anyone, but you just have to make an informed decision and live with the compromise.

If you are in the UK, then Sage would certainly be a name you could trust, but their recent security faux-pas with their Sage Live beta would likely make any consumers of a future service from them think twice. 

A third party can certainly lose your data.

This is not so much about losing the data forever, in some kind of data disaster, where it cannot be retrieved by backups, it’s losing it outside the realms of who should be allowed to see it.  This happens all the time, as shown by the British Government’s suppliers, unknown small outfits like PA Consulting, EDS, Atos Origin, etc etc.  I could go on and on, but you can read about countless others here.

This can lead to it falling into the hands of those you don’t want to have the data, but in a passive way.  As we know, august organizations like SAP have allegedly filched data in a less passive way as well.

Another very recent one where they did actually lose it completely was magnolia.com, not really a business critical service, but certainly affected those users that had invested their IP for up to 3 years.

Your data can be easily converted into cash. For someone else.

For data that has been lost or stolen, there is almost certainly a ready market for that data if it is in the hands of a less ethical organization.  Of course, it requires an unethical organization to purchase and use the data, but I don’t think they are in short supply either, especially if the data can severely hurt a competitor or dramatically help the business.  In these lean times, it may be the case that the moral high bar is lowered even more.

This may be the unethical company itself, or far more likely, some disgruntled employee that wants to make a quick buck.

New territory in the Sarbox / Gramm-Leach-Bliley world.

Data bureaux are nothing new, industry has been outsourcing data processing for years, but this has been mainly in administrative areas such as payroll, or transactional such as SWIFT.  This stuff is pretty tedious and not easy to get any kind of edge on your competitors with.

Salesforce.com are the SAAS darlings, but they have already have had their data loss moment.  And that’s only the one that was public. One might say that the information held on Salesforce.com is not that critical, but it certainly might be very useful to your competitors.  However, you’re not likely to get hauled over the coals in the court of Sarbox for a competitor poaching your deals.

Once you start handing over key financial data to a third party, then the CEO and CFO are signing off on their deeds too, since you are responsible for the data, not the third party.

You probably need to think about buying insurance for this eventuality.

Another consideration is where in the world your data is stored, in the nebulous cloud, as not all geographic locations are equal, as regards privacy.

Under new management.

To use Salesforce as an example, they have Cognos as a customer. I don’t know if that’s still true, but let’s say it is.  Now, our old friends SAP decide to buy Salesforce.com.  Allegedly no strangers to a bit of data voyeurism, it would not be beyond the realm of the imagination (hypothetically, of course) that they may let the Business Objects folks (sorry, SAP Business Objects) take a sly peek.

On the more mundane side, should a more high quality vendor divest a SAAS business to a smaller, less blue-chip organization, you have a review and possible migration on your hands.  See the Satyam debacle for the sort of ructions switching an outsourcer creates, especially in the context of a disastrous event.

Who pays the integration costs?

The fly in the ointment in the nirvana of throwing the problem over the side and getting the low capital outlay, useful BI within weeks etc etc is the dirty old job of integration.  It’s generally one of the most painful aspects of the BI stack even working within the organization, but then dealing with the issues of feeding an external provider makes it even hairier. 

In the case of Salesforce or other outsourced data, it’s far less of a problem, since theoretically, the outsourcer can just easily suck that data using clean, documented APIs.  However, there are costs involved in moving the data to two sites, the usual operational use of the customer and the BI use of the outsourcer. That could be bandwidth or other charges for data exporting etc, or when the SAAS fraternity wake up and start creating a new license and premium for providing your data to external entities.  Kind of like the oil companies keeping the price of diesel high (in the UK anyway), so those folks trying to save money by buying a car with better economy end up paying roughly the same anyway.

So what’s the mood?

I observed a very interesting straw poll at the 2009 Gartner BI Conference in the Hague.  At a large session, Donald Feinberg of Gartner asked the audience how many were considering SaaS BI.  The show of hands was either non-existent or maybe just one.  The reason, trust.  I imagine the attendees at this type of conference are more at the larger end of the enterprise spectrum, so there may be more interest in the lower leagues.

Gartner BI Summit Part 2

January 29, 2009

As promised in the Mini-Summary, which was written in some haste to appease those who weren’t enjoying the 24-hour party city that isn’t The Hague, a little (in fact, rather a lot) more on what went on at the Gartner BI Summit.  In the Mini-Summary, I covered the keynote, in somewhat light detail.  It was probably enough to give a flavour. 

I’ll outline the sessions that I attended so you know I wasn’t in the Hague for the crack. It’ll serve as an aide-memoire for me too.  It was great to meet up with some of the folks I met on Twitter and also others that I first met at the conference. On with the summit.

Tuesday

Data Integration Architecture and Technology: Optimizing Data Access, Alignment and  Delivery – Ted Friedman – Gartner

This is an area of interest to me, as one of the products I look after is firmly in this space. A very good presentation containing plenty of survey-based facts, and a case study on Pfizer, who have a ‘data integration shared services’ organization.  I suppose this is a DI version of the Business Intelligence Competency Centre.

ETL is still by far the largest area of DI, with replication, federation and EAI following. In addition, standardization of DI tools/architecture within organizations is still some way off.

The high-level message was that Data Integration is an absolute foundation of any data operations, whether BI or otherwise.  Without good DI, you just end up with the old GIGO scenario. Not too much new for me, as to be expected, but Ted did put the kibosh on the BI as a service by reflecting my own personal view that in most cases, these data environments are ‘too diverse’ to lend themselves to easily to the SAAS model due to being hamstrung by the data integration piece of the puzzle.  Narrow, specialized solutions can work, as well as simple data environments.  However, as was pressed home later in the conference, that’s not the main reason BIaaS will not be as popular as many are projecting.

Innovation and Standardization: The Yin and Yang of BI – Timo Elliott / Donald McCormick of SAP Business Objects

This session started with Timo mashing up some Obama data in Xcelsius and was generally designed to show that SAP Business Objects still has some innovation to show, even now it is part of the Walldorf Borg. The main highlight (from their point of view) was Polestar.  I took a very quick look at the site, but was diverted by the typos “quentity” and “dectect” as well as noting it was not tested on IE8, so I left it for another day. Looks interesting though.

SAP generously conceded that less than 50% of business data exists in SAP.  I am assuming they mean within organizations running SAP.  Even then, that’s probably an underestimation.  To that end SAP are introducing federation capabilities.

The Role of BI in Challenging Economic Conditions – Panel Discussion

The panel consisted of some large customers from around Europe.  They were giving their views on how the climate affected their BI activities.  Key point here include reducing the number of tools and vendors in the BI stack, squeezing licence costs – either by forcing prices down via negotiation, redeploying seldom used licenses or other BI ROI audit activities.  Naturally, I imagine some licenses will become available as headcount shrinks this year.

The customers were focusing their BI efforts more on costs than on revenue and margins, which were previously the focus.  In this uncertain environment, the speed of decision making is critical and some of the selection criteria for BI tools and initiatives have changed a lot.  One of the customers noted that they used to talk about the look and feel, get down to details such as fonts etc, now its “how much, how fast to implement?” 

BI is going to be more tactical for the short term, with small-scope projects targeted at answering key questions quickly.

Emerging Technologies and BI: Impact and Adoption for Powerful Applications – Kurt Schlegel – Gartner

This session looked at the macro trends in BI, which were as follows:

  • Interactive Visualization (didn’t DI-Diver do this back in the late 90’s?)
  • In-Memory Analytics
  • BI Integrated Search (they showed Cognos here, but strange there was no mention of MOSS 2007 / BDC which does this quite nicely)
  • SaaS (showed a good example where the SaaS provider had a ton of industry information that could be leveraged for decision making, rather than just some generic solution shipping in-house data back and forth)
  • SOA / Mashups
  • Predictive Modelling
  • Social Software

None of this was new to me, but there were some good case studies to illustrate them and the SaaS example was the most realistic I’d seen from a business benefits point of view.

 

Wednesday

Using Corporate Performance Management to understand the drivers of Profitability and Deliver Success – Nigel Rayner – Gartner

This was an area I wasn’t too familiar with, but Nigel Rayner did an extremely good job in pitching the information and delivery as to not overwhelm novices, but not oversimplify and thus bore the teeth off seasoned practitioners.

Kicked off with increasing CEO turnover, then how the market measures CEO performance.  Most organizations don’t have a handle on what actually drives profitability, which is where CPM can help with profitability modelling and optimization.  The whale curve was discussed and Activity Based Costing.

A key point that was made is that BI is very often separate from financial systems and CPM links the two together.

Driving Business Performance in a Turbulent Economy with Microsoft BI – Kristina Kerr – Microsoft , Ran Segoli – Falck Healthcare

MS BI case study, focusing on the cost-effectiveness and speed to implement of Microsoft BI.  I have had a lot of exposure to the stack and other case studies, so didn’t make notes.  Sorry.

Does BI=Better Decision Making – Gareth Herschel – Gartner

Really enjoyed this session, for the main reason that this was a welcome step back from BI per se, and looking at decision making in general.  He looked more at the theory of decision-making first, then linked that to BI.

The first area was predicting (root cause) or managing events, if this can be done effectively, then the increased speed of detection can allow more time to make appropriate decisions, especially as the more time you have, the more options you have available. This ties in to CEP (complex event processing) and BAM (business activity monitoring).  In addition, data mining can assist in predicting events and scenarios.

This is a discipline that must be constantly reviewed, as what happens when prediction and analysis disagrees with reality? Either the situation has changed, or you didn’t understand it correctly in the first place.

He went through 4 key approaches to decision making and their rating of explicable vs instinctive and experience required.

  • Rational/Analytical
  • Emotional
  • Recognition Primed
  • Thin-slicing (“Blink”)

This fed in to information delivery methods.  This would be selective displays such as dashboards, alerts/traffic lights, or holistic displays such as visualization, which are more ‘decision-primed’ than data-centric displays such as tabular representations.

It was clear that he saw visualization and very narrow, selective displays as the best way to aid decision-making.

In my opinion, all that’s fine and dandy, if you’re measuring and delivering the right targeted data 100% of the time, otherwise it is very easy to be blindsided.

Would certainly seek him out at other Gartner events for some thought-provoking content.

Gareth made some good book recommendations:

The Art of the Long View

Sources of Power

Various Dan Gilbert stuff on Emotional Decision Making – This is his TED talk.

 

The Impact of Open Source in BI, DW and DI – Andy Bitterer & Donald Feinberg – Gartner

A very good session, surprising at least one of the open source advocates in the audience with it’s upbeat message.  A highlight was Donald Feinberg’s prediction that Unix is dead and the funeral is in 30 years.  This is in response to Unix ceding to Linux in the DBMS world. It appears Gartner have relaxed their usual criteria in order to give OSS a chance to be evaluated based on support subscription revenue.

Feinberg also strongly recommended that anyone using Open Source must get a support subscription, to do otherwise being tantamount to lunacy. 

On to the BI side of OSS and market penetration is low, with less than 2% of Gartner-surveyed customers using it.  However, a growth area with small ISVs using it as an OEM strategy for their BI requirements.

The functionality gaps are getting smaller between commercial and OSS, with Reporting, Analysis, Dashboarding and Data Mining all now being offered, but still no Interactive Visualization, Predictive Modelling, Mobile play, Search or Performance Management.

On the DI side, other than the Talend/Bitterer argument, it’s not hotting up too quickly.  DI is mostly limited to straight ETL of fairly meagre data volumes, daily batches of around 100K records.

Functionality gaps here are in the following areas: Metadata management, Federation/EII, Replication/Sync, Changed Data Capture, Unstructured Content, Native App Connectivity and Data Profiling/Quality.

An overarching issue to adoption in all areas is the lack of skills.

An interesting scenario that was floated was the creation of an open source stack vendor, namely Sun, snapping up some of the OSS BI players. 

The Right Information to the Right Decision-Makers — Metadata Does It – Mark Beyer – Gartner

This was a useful presentation for me, as I am familiar with metadata, but not the systems and processes used to manage it.  So the definition of metadata as per Gartner is data ABOUT data, not data about data.  Metadata describes and improves data, unlocking the value of data.

I knew some classic metadata SNAFUs such as projects where measurements across country-separated teams were in metric or imperial, leading to untold costs.

Some others that Mark mentioned were very amusing, such as the data members of Gender.  I can’t recall the exact figures, but one government organization had 21. 

On to why metadata matters in decision making – it can be an indicator of data quality, it can indicate data latency and can provide a taxonomy of how to combine data from different areas.

In addition, metadata can help provide a business context of the data, in addition to mapping importance, user base and various other elements to give an idea of how critical data may be and the effects of improving that data or the impact of any changes in the generation or handling of the data.

Obviously SOX and Basel II also put increased pressure in managing metadata for the purposes of compliance, governance and lineage.

I think the takeaway for me was this, in terms of key questions that metadata should seek to answer.

  • What are the physical attributes of the data (type, categorization etc) ?
  • Where does it come from?
  • Who uses it?
  • Why do they use it?
  • How often do they use it?
  • What is it’s quality level?
  • How much is it worth?

Comparing the Mega Vendors: How Do IBM, Microsoft, Oracle and SAP Stack Up? –Andy Bitterer, Donald Feinberg, James Richardson, Neil Chandler & Bill Hostmann as moderator

Stupidly, I ran out of paper, so had to take some notes on the phone.  I don’t like doing that as it looks like you’re texting people, or Twittering.  So, I limited myself to the bare minimum.

Performancepoint is weak with respect to the competition.  I guess it’s even weaker now they’ve ditched planning. 

Donald Feinberg is not a fan of SaaS BI.  A view I agree with, party due to the data integration issues in the real world, as highlighted by Ted Friedman, earlier in the week.  So, Donald decided to do a straw poll on who would be interested in/ implementing SaaS BI. I think there might have been 1 person, but possibly zero.  There goes a bunch of predictions for 2009.  The reason for this retiscence was one of trust, they just don’t want to throw all this over the firewall.

Another straw poll was the consolidation to a single vendor, most are doing this and very few said they were going to buy from a pure play vendor.  I suppose you have to take into account the self-selecting group of end users at a Gartner BI summit though. 

Thursday

BI Professional – Caught in a Cube? – Robert Vos – Haselhoff

Entertaining presentation, but I was suffering with a bad cold and insufficient coffee, so didn’t get the full benefit.  He did help me wake up fully for the next presentation, so can’t have been all bad.  No talk of vendors and technology per se here, more stepping back and looking at strategy, organizational elements and making BI work from a people perspective.

Building a BI and PM Strategy Workshop – Andy Bitterer & James Richardson

This was an interactive session.  Like a mock exam for BI folks where a bunch of people were randomly put in groups and asked to design a BI strategy.  The results were pretty good and Andy Bitterer’s wish that they didn’t start naming vendors was fulfilled.  However, I did note an issue with people really thinking details first, rather than strategy first.  I also found it slightly strange that the CEO did not tend to come up as a contender for involvement.  I saw more of this in Nigel Rayner’s CPM presentation, with CPM giving the CEO insight into profitability, so it seems to me to make absolute sense to have CEO involvement in the BI strategy, since the BI goals need to be aligned with the business goals.  Some others did pick up on the alignment, but still saw it as in the CIO remit.  All in all a pretty good showing, but the IT and ‘the business’ lines were still visible, if somewhat more hazy than before.

Data Quality: Your Decision Insurance – Andy Bitterer & Ted Friedman

I took a LOT of notes in this session, so I’ll try and boil it down.  Typical situation is a bunch of folks in the boardroom, all claiming different numbers.  This leads to risky decision-making if unnoticed and a huge time sink reconciling when it is noticed.

Once again, there is a turf aspect involved, with data being considered IT’s problem, so they should be responsible for data quality.  However, IT is not the customer for the data, so don’t really feel the pain that the business feels from data quality issues.  In addition, IT don’t know the business rules or the domain expertise.  It’s not a pure technology problem, but IT need to be involved to make it work.

There were some examples of the costs of bad data quality, leading to working out ROI for investment.  With Sarbox et al, of course there is a new cost involved for the CEO/CFO, the one of going to jail if the numbers are wrong.

Another aspect of the ROI was based on the level of data quality, it may be that 80% is enough, especially when the move to 100% is astronomically expensive.  The return on incremental improvements needs to assessed.

So, who’s on the hook for DQ then? Data stewards, who are seen as people that take care of the data, rather than owning it (the organisation owns it) they should know the content and be part of the business function, rather than IT.

An example to show how exposing data quality within an organisation was a DQ ‘scorecard’.  This gives an idea of the quality, in terms of completeness, duplication, audited accuracy etc.  A problem that I see with this is a kind of data quality hubris versus a data quality cynicism.  If it works well, then the scorecard can give the right level of confidence to the decision makers, but if not, then it could lead to overconfidence and less auditing.

So, operationally the key elements are:

  • Data Profiling / Monitoring – e.g. how many records are complete.
  • Cleansing – de-duping & grouping
  • Standardization – rationalizing SSN formats, phone nos etc
  • Identification & Matching (not 100% sure here, I see some of this in cleansing)
  • Enrichment – bringing in external data sources, e.g. D&B to add more value to the data.

Ideally DQ should be services, which are then reusable and repeatable – used by many different data sources.  SOA model, although SOA is supposed to be dead isn’t it?  Who knows, maybe the term has died – the technology and approach certainly lives on.

Lastly DQ ‘Firewalls’ were discussed.  This is a set of controls used to stop people/systems from poisoning the well.  Inbound data is analyzed and given the elbow if it isn’t up to snuff. It even incorporates a ‘grass’ element, where DQ criminals are identified and notified.

Market Players: Magic Quadrant Power Session – Andy Bitterer, Donald Feinberg, Gareth Herschel, James Richardson, Mark Beyer, Neil Chandler & Ted Friedman

The conference starting to take its toll by this point, a flu-like cold and no more tablets left.  Add that to a few pretty late nights, notably with folks from Sybase, Kognitio, the BeyeNetwork, end-users and even Gartner (not analysts, I hasten to add) and the writing is on the wall.  Deciphering my handwriting is like translating hieroglyphics written by a 3 year old.

So, the summary of this session is ultra-short.

  • BI MQ SAP/BO moved down a little, counterintuitive to some.
  • DI MQ Data services / SOA capability is key.  Tools need to supply and potentially consume metadata to play well in a real world environment.  Currently 54 vendors ‘eligible’ for this MQ
  • DQ MQ Pace of convergence between DI and DQ is increasing, it will become critical.  Acquisitions will increase from DI vendors having to fill out their feature sets.

Overcoming The BIg Discrepancy: How You Can Do Better With BI – Nigel Rayner

I made a herculean effort to stay conscious in this session, mainly because I had enjoyed Nigel’s CPM session and he proved also to be a very nice chap when we chatted over a cup of coffee earlier in the week.  In addition, I had paid for the 3rd day, so was going to extract every drop of value 😉

Nigel kicked off with “the downturn”, of course.  The message was do not hit the panic button.  BI and PM will play a key role in navigating the downturn:

  • Restoring business trust
  • Understanding why, what and where to cut
  • Identifying opportunities of business growth, or which parts of the business to protect

There was some reality also, in that it is unlikely that “Big BI” projects will be approved in the short term and you will need to do more with what you already have.

The plan of attack is the 3 I’s – Initiatives, Investments and Individual Actions

Initiatives

  • BI/PM Competency Centre
  • BI and PM Platform Standards
  • Enterprisewide Metrics Framework
  • Inject Analytics into Business Processes

Investments

Prioritization of investments is critical.  Targeted short-term, cost-effective investments are the order of the day.  Some suggestions include:

  • Data Quality
  • Data Mining
  • Interactive Dashboards
  • CPM Suites

There was a mention of ‘Spreadsheet hell’ being addressed by CPM.

Individual Actions

  • Take advantage of key skills as companies undertake knee-jerk cost-cutting, AKA get good laid-off people on the cheap.
  • Redeploy key employees to tactical, short term roles rather than RIF-ing them.
  • Respect “conspicuous frugality” but don’t be defined by it.
  • Learn from others (i.e BI award winners, case studies, social networks)
  • Evangelize BI

Then, it was a mad rush for the taxi to the station.

For more, detailed coverage of the event, check out Timo Elliott’s blog post.

Not going to be a long post, this, but wanted to get a few things down. Will edit/append later with some info from other sessions. Look at my Twitter feed also for some snippets.

Keynote: ‘The BIg Discrepancy: Strategic BI, but no BI Strategy’

BI Analyst Techno with Andy Bitterer and Nigel Rayner AKA Star Schema.  BI-related refrains to the sound of banging techno with an ‘Every Breath You Take’ melody.

Once again, BI is #1 on CIO agenda.  This has been the case since 2006, but not much further along.  This is more due to human factors than BI tools. Many organizations don’t appear to have a strategy for BI and there are still problems in the following areas:

  • Governance
  • Standards
  • Trust
  • Skills
  • Definitions

Still a lot of silo thinking and a proliferation of tools.  Adding that to internal politics leads to a heady mix.

A straw poll revealed only 15 hands from customers that had a formal BI strategy.  Some BI from Gartner is needed about how many customers were in the keynote, even an approximation would help.

Another key point was the ability of BI to support change, as well as the effects of making changes to variables.  BI must be able to adapt quickly to external conditions, such as the ability to optimize to cost reduction instead of revenue growth, for example.

I was left with the impression that the biggest problem is not the tools, but the bad craftsmen (in the nicest possible sense).

Another idea that BI Competency Centres would help to make BI initiatives succeed, since it would likely be tasked with addressing the problem areas above explicitly.

The next musing was why does IT often sell BI to the business, when it really should be the business users driving it.  One possible problem with business users creating BI requirements is that they may not know what is or isn’t possible, resorting to the comfort of reports when asked to define their own requirements. I suppose this is another plus point for a BI competency centre, which could serve the function of business user training and/or demonstration of the techniques and technologies available.

As a follow-on to this, the point was made that IT building BI systems in isolation, away from business users will very likely lead to failure.

This all reminds me of the work I did on ‘Expert Systems’ shells back in the early 90s, with the Knowledge Engineer (read IT BI person) and Domain Expert (read business user) working in conjunction. This was a pre-requisite of the approach, not a nice-to-have, as it seems to be with BI.

Unfortunately the web seems to have failed me for any really good references and solid examples of this, certainly none as detailed, iterative or collaborative as the processes we were using at IBIES back in 1992.

From Engelmore & Feigenbaum:

‘A knowledge engineer interviews and observes a human expert or a group of experts and learns what the experts know, and how they reason with their knowledge. The engineer then translates the knowledge into a computer-usable language, and designs an inference engine, a reasoning structure, that uses the knowledge appropriately’

I digress.

Then they went through the 2009 Predicts, which are available here.

I’ll probably add updates later, just wanted to get some info on the keynote down.

Business Intelligence is a term that covers a multitude of sins.  It is also a term which is extremely open to interpretation, depending on your viewpoint, technology mastery, user skillset and information environment.

Creating new terms, especially acronyms is what the technology industry does best, they delight in it, but it does serve some purpose other than the amusement of marketing folks and analysts.

To go back to an old paradigm, creating labels or categories is an essential part of the market.  Not just the BI market, but any village market, or supermarket. 

Categories help consumers navigate quickly to the types of products they are interested in, like finding the right aisle to browse by looking up at the hanging signs in the supermarket, or the area in the village market where the fruit vendors gather.  Labels give more information, such as pricing, size etc and then it is down to the product packaging and the rest of the marketing the consumer has been exposed to in terms of advertising, brand awareness and so on.

Business intelligence is a pretty long aisle.  At one end, the labels are pretty narrow but at the other, very very wide, to accommodate the zeros after the currency symbol and ‘days to implement’ information.

The problem is the the long aisle – vendors need to break that aisle up into manageable (walkable) segments to help the customer navigate quickly to the solution they need.

The other problem is that in this case, the supermarket is not in charge of the category names, not even the vendors or analysts are – it’s a free for all.

This means chaos for the poor consumer, all capering around in the aisle like some kind of Brownian motion.

Thinking about this, after being bombarded with a panoply of BI terms lately, I thought of INCOTERMS, which is a standard set of terms used in international sales contracts.  These terms are strictly defined and published by an independent organization so that both parties in the transaction know exactly where they stand.

According to Boris Everson of Forrester Business Intelligence is “a set of processes and technologies that transform raw, meaningless data into useful and actionable information

Not sure about that one myself – who acquires and stores meaningless information? Other than maybe Twitter.  Other suggestions most welcome.  It might help show the possible technologies Forrester are referring to.

This certainly excludes my product, since we work with data that theoretically, people are probably already making decisions from. They just need to slice and dice it differently.

I can’t complain too much about Forrester though, at least they have Report Mining in their buzzword bonanza (courtesy of Curt Monash), our little niche.

The concept of transforming raw data is easier to work with (in the Forrester BI definition sense anyway) as it could refer to something like a web log, which is pretty difficult to gain any insight from by looking at it in a text editor, unless you have an eidetic memory and the ability to group and summarize the various keys and measures in your head.

Now, as often is the case when you start writing about a topic, the research you do unearths people who have written pretty much the same thing before you.

Going back to definitions, finding Todd Fox’s decent definition of BI – “A generic term to describe leveraging the organizations’ internal and external information assets for making better business decisions.” from a define:Business Intelligence search on Google, leads to Todd’s own attempts from a Data Warehousing perspective, which, in turn was prompted by James Taylor’s post on the confusion around the term analytics (in the context of BI). In addition, even Kimball was involved with his “Slowly Changing Vocabulary” section in one of his books.

This at least tells me I’m on the right track, if not entirely original.

In 1989 Gartner’s Howard Dresner defined BI as “a set of concepts and methods to improve business decision making by using fact-based support systems

More definitions can be found from Larry English and probably ad infinitum, or at the least, ad nauseam.

The depressing thing here is that we have only got as far as the “umbrella term” as BI becoming popularly known.

<Aside> A Dutch student at the University of Amsterdam even wrote a paper titled “Business Intelligence – The Umbrella Term” complete with an umbrella graphic on the cover page.  (It’s a doc file, so I won’t include the link.  Google it if you’re interested)</Aside>

When we start to address even Forrester’s BI buzzword hoard, never mind the others out there, it begins to lead to a total breakdown in the tried and tested categorization mechanism.

To revisit the source of the proliferation, it appears that analysts (likely as a proxy for large vendors) and vendors themselves are the main culprits.  The analysts, by virtue of some level of independence and a cross-vendor view can be seen to be the arbiters of the terms.  The problem here is that the analysts often use slightly different terms or at least different meanings for the same terms.

Naturally, both vendors and analysts want to proliferate and blur terms to aid in differentiation, or to try give the perception of innovation and progress. 

Although this is very seldom the case, as new terms are often just fracturing or rehashing existing categories and terms.

However, in some cases, drilling down into more narrow categories or updating terms due to changes in technologies or approaches is not necessarily a bad thing, if the terms/categories still aid in establishing a contract of understanding between vendor and consumer.

If we want to accommodate this, the ability to establish a common understanding, based on input from across the board – analysts and vendors, would be beneficial to all.  The problem is, you need a real independent organization that can accommodate the horse-trading, as well as maintaining an authoritative definition of terms which is acceptable to all parties.

Some amusing aspects of this I can foresee would be “Active Data Warehouse” – would you have to then create a new term “Passive Data Warehouse” to group the applications that did not fit the criteria of “Active”.  I imagine a semantic arms race that would have to be kept in check – IBMCognoStrategyFire pushes for a “Smart ETL” category, which forces the other ETL vendors into the “Dumb ETL” pigeonhole.  Dealing with this is what standards bodies do.

This is more musing than actually being stupid enough to think this is ever going to happen.  I do have sympathy with the poor customer trying to navigate the shelves of the BI supermarket though.  As someone just trying to keep a lazy eye on the machinations of the industry, it can be overwhelming.

Here’s a short quiz.

What BI term does this refer to?

“centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.”

Was this your first guess? This?

No.  Much earlier.

The more things change, the more they stay the same.

Maybe we could just provide a thesaurus, so when someone is puzzling over the latest buzzword, they can look it up and say ahh, I know what that is, we tried to implement something like that back in the early nineties.

 

UPDATE: Read this excellent article from Colin White – I didn’t see it before I wrote this – I promise!

The catalyst for Colin’s article (Stephen Few) can be found here and follow-up by Ted Cuzillo here.