Microformats: Moving along with the evolution of the Web
The intention of this article is to address some of the common arguments (or misconceptions) about microformats my colleagues have made on IRC, IM, blogs, and alternatives which I've heard in conferences and at water coolers.
How many Web sites can you point at random that has valid and proper markup? Consider the complexity that goes into producing a proper Web document (i.e., (X)(HT)ML). Microformats works quite well into the equation that it can solve a greater portion of the common problems, namely the "semantic web" using the widely adapted standards on existing documents.
We have the opportunity to boot-strap our existing data with microformats however. We can use (h)GRDDL for instance to transform a microformated (X)HTML document into RDF(a). If and when the uppercase "Semantic Web" is here (in fully functional order and with all the bells and whistles with "usable", "human friendly" applications and tools both for the publishers and end-users) then we can push our existing data forward. Going from HTML (semantically unmarked data) to RDF (which is actually solving a different problem) is pretty much impossible unless we do a lot of heavy text-mining and run heuristic algorithms: not necessarily the best way to tackle the problem right now if you ask me.
Natural language processing
In the case of NLP, this is still quite complex today because we are bound to the limitations of a) our knowledge on solving NP problems b) digital computing. When quantum computing becomes available at a level where we can ask a computer for instance to give a summary of an article, we will no longer need to worry as much as we do now about the machines struggling to understand the data bits. The problem is still quite complex since machines need to be fed extra information for context. Understanding humour, sarcasm, irony, and variety of emotions to name a few is necessary for a machine to get to the real context of the information and to go beyond the literal meaning.
Not a revolution
Microformats is not a new language and it is not meant to replace existing formats. It simply allows us to have an (X)HTML document that can be multi-purpose by reusing existing formats.
If we have to use a className for an article title, what would make more sense: a) "p-name" b) "my_title" c) "foo" d) ? The fact is that it is required (for the sake of the argument) and the benefit is that we have a way to identify that article title in the document. How is that in any way a problem for the humans or the machines? Is "my_title" any more useful to the human then "p-name" ? We are killing two birds with one stone here by reusing an existing standard: whether it is vCard, Atom, geo or ISO 8601 etc.
Microformats is not tied up to any markup (one of the goals) other then a few exceptions as far as parsing is concerned, however, that is always subject to change in light of a better way of solving a particular problem. So you can take an un-semantic markup and it can still contain microformats in which a script can retrieve those identifiable components.
Elemental and compound formats
Microformats also allows us to reuse a single instance of data for multiple formats. Several class-name-patterns can be combined to represent the data in different contexts (e.g., "fn", "summary" for hCard and hCalendar). How is keeping track of a vCard file, Atom/RSS file separately any better for the user? The point is to define it once and have the opportunity to reuse: a common good practice in computing disciplines.
Keep in mind that microformats is not intended to solve all our problems. In fact, even if it tried, it can't. But it get us 80% of the way, or at least based on the problems it sees fit to solve. (X)HTML has its limitations. The point is to solve our most common patterns and problems. Always keep the amount of work really goes into implementation in real-world cases in perspective and weigh out the pros and cons for what is to be solved.
What about code bloat? people ask. There is a compromise to be made here because in order for microformats parsing to work in some cases it needs to follow the specified hierarchy of name patterns. This of course requires us to use a structure that may require us to use extra containers then we really need to. This is not necessarily a bad thing depending on how minimal one wishes to write their (X)HTML. Keep in mind that using minimal (unique) markup forces us to write redundant or hard to scale code. Since common patterns occur, it makes sense to define templates on a granular level in order to reuse them in various places.
Another common worry, objection or criticism has to do with making data easily parsable by "bad" people. This will always be the case for any format that is not encrypted or accessible by a script. Having said that, technologies can always be used for good and bad and they are designed to aid some human need in the end. The whole premise of the Web and how it took off as it did is because of the fact that we made data easily accessible by humans; textual markup for the win. Microformats emphasises only marking up visible information for humans. If it is not intended to be retrievable then it would make more sense not to make it public from the very beginning. If an email address is intended to be read (without having to look at the source code) then that email address can be optionally marked and be available for the scripts. In any case, the information is already there and can be retrieved by a fully committed script regardless of any tricks to fool email harvesters. We are able to set licensing terms (e.g., creative commons) on the data that we share; it helps the reader understand their rights with the information and what they are allowed to do with it.
Solving the i18n issues is beyond the scope of what microformats is trying do. Simply, if a standard (i.e., vCard's "tel") can address localisation better, then it would be easier for microformats to embrace them.
This is a big can of worms on its own but I will just say this which I believe puts us back on perspective: Accessibility applies to everyone. Essentially we want to reach most humans as possible and continue to improve on that.
Microformats is part of the natural evolution of the Web. Simply there is a need for it and the solution is reasonable enough to adapt to.
I think it's a bit of a cop out to say "i18n is beyond the scope of what uF is trying to do." Likewise, "[t]his will always be the case for any format that is not encrypted..."
If uF wants to gain the trust of the community, it has to at least address these issues or point developers to resources that deal with these issues rather than shrug them off because they fall out of scope. It's like saying, "Google SERPs are full of spam but they're all about indexing content, not filtering spam."
@Ara Pehlivanian: My bad if I seemed like brushing it off, but, localisation, accessibility and actual use cases are indeed explored by the community. Some solutions to the problem however are not so practical.
It is like throwing out HTML because there are elements that are ambiguously defined or there is an insufficient number of elements to cover the details of all languages; not to mention adaptability and need. Just because XHTML can assist us on that is not really a good solution either. Most people can't write an XML Schema (bad assumption on my part).
If there is a slight problem with localisation of
tel, we don't have to reject all of hCard. We can choose to leave that bit out and still benefit from the rest until a better solution is identified. I don't think we have to boil it down to all or nothing.