Well-formed mark-up?

There’s an interesting debate going on in the W3C HTML working group about whether well-formed HTML is important in the specification process for HTML5. It feels to me somehow intellectually that well-formedness is a valuable goal but when it comes down to explaining why it matters I’m finding it hard.

Which of the following is “better”:

normal<b>bold<i>bolditalic</b>italic</i>normal

or

normal<b>bold<i>bolditalic</i></b><i>italic</i>normal

The first is shorter (and works in all the popular web browsers) while the second is well-formed. Well-formedness isn’t about being smaller. It’s also not about performance: it turns out that the parsers in browsers often process certain non-well-formed mark-up faster than if it had been well-formed.

Since browsers have to parse both alternatives and the HTML5 process is about ensuring that they do so in a predictable and interoperable way then should there be any weight behind well-formed documents? After all, the spec doesn’t prevent you from choosing to be well-formed if you want to.

The analogy I’ve been considering is about indentation in C++ source code: few people would probably write C++ without a sensible indentation strategy to help make the code readable. Yet the C++ spec doesn’t need to say anything about indentation – it’s a best practice but not a formal part of the language definition. Could writing well-formed HTML be a best practice that’s not a formal part of the language definition?