XML is taking over the world! XML is ubiquitous! XML wins! The work is over. Or is the work just starting? Are we at Extreme Markup Languages riding the wave of XML success? Or riding for a fall as we learn that our comfortable world is not the whole world? Have we created the success of XML, or are we just along for the ride?
Keywords: Markup Languages
First of all, I want to tell you all how delighted I am that we are all here in Montreal and at Extreme Markup Languages 2007!
I go to a lot of conferences. Extreme is one of my my favorite — because at Extreme I can count on learning something unexpected. I will learn a new approach, or a new technique, or about a new problem, or a new tool. I expect to be pushed into thinking about some things I haven’t thought about before, and I expect my opinion on at least one thing to change during the week. I expect to listen to some people I think are barking up the wrong tree, and perhaps one or two who are simply barking mad. And I expect that a few of the talks will simply leave me bewildered. I do NOT expect to be bored at Extreme.
Learning a thing or two, and seeing some old friends, and meeting some new people is more than enough to expect of a week. But actually, I’m looking for more than that at Extreme this year. In a sense I’m looking for validation. No, I don’t mean validation of my individual contribution, or validation of my worth to society, or any of the touchy-feely popular-psychology “validation” things people write best-selling trade books about. If I don’t have a clear notion of my personal worth by this stage in my life, a room full of conference attendees — as much as I like and respect you all — won’t help.
But … let me tell you about a conversation I had recently with my “little” brother.
We were talking about who was doing what. About vacations, and papers in journals, and books published, and trips to China, and how to keep the rabbits from eating all the lettuces in the garden. And there was a short pause in the conversation. And my brother asked, “So now that XML is over, what are you going to do?” And I said, sounding charming and articulate as always, “Huh?” Well, he said, “You had fun with XML, but now what? I never read anything interesting about XML anymore. It’s OVER.” “It’s NOT over,” I said. “It’s ubiquitous. XML is behind the scenes in practically all new computing devices, probably even the cell phone in your pocket. And it is being used by many of the publishers who are providing content in both print and on the Web, and it is being used by scholars, including libraries, archives, linguists, historians. A week doesn’t go by that I don’t hear about some organization, some industry, or some application that is moving to XML.”
“Yeah. I know that. But they’re all doing the same thing, more or less. Plastic is ubiquitous, too. And it seems likely that there are materials researchers are working on developing cool new plastics at this very moment. But in a very real sense plastics are dead. We know about plastics, and developing new ones, and new uses for plastics, is no longer seriously cool; it is just normal development now. Just as writing a new tag (That’s what you do in XML, right?) is normal development, not something seriously cool. Right? And besides, how long can you either make a living or feel good about yourself selling refrigerators to Eskimos? After a while they will all HAVE refrigerators. They are introducing kids to XML in grade school now; there won’t be a need for XML Consultants to bring XML to the great unwashed for much longer, everyone will know about it.”
Well, at about that time the baby started to fuss, or the fish on the grill was ready, or some such. But it left me wondering. Is he right? Is XML over? Am I just a bit slow in figuring it out? (I have to admit, I was a bit slow to figure out that HTML was going to fly. It was such a tiny tag set — and the people promulgating it had such unreasonably high hopes for it. … Well, I was wrong. Of course.)
XML. Is it over? Are we down to the normal use stage? Are we getting to the time when any well-educated person is computer-literate? And computer-literate includes understanding XML? I wish it were so, but I don’t quite believe that it is. And I am not certain that it will be so. Not just that it won’t be so in the near future; I am not sure it will be so at all. Or that it should be. (I have just told you that predicting the future is NOT among my strengths — and here I am doing it. How Extreme!)
I think there is a lot going on in XML right now, and I think there will be interesting developments in XML for quite a while. I am looking for Extreme in general, and for the papers I hear and conversations I participate in — and also perhaps those I overhear — to validate my position that XML is not over. … On the other hand, perhaps it is.
I have a second brother. No, really. I do have two brothers, and both of these conversations really happened. Not exactly; I don’t record chats with my family. But more or less. Actually, I do have an email message from my brother-the-scientist. The subject line reads “Is DOI DOA?” Now some of you are saying “Huh?”
So, let me start a few months earlier. My brother had noticed that when he read a journal article online and looked at the references, some of them included something he didn’t recognize, called a DOI. He asked me what a DOI was and what it was good for. I reminded him how much he likes it when he clicks on a reference and immediately sees the article that was referenced. (In fact, he likes it so much he has forgotten that this is a relatively new capability, and he gets cranky when it doesn’t work.) I explained to him that DOIs are the technology that makes that work. So, he said, “DOI is another name for URL? Right?” Wrong. DOIs are unique identifiers for digital objects such as journal articles, that remain stable regardless of the location or owner of the object. So, if a publisher reorganizes their web site, pointers to the location of an object would break, but pointers to the object’s DOI will still work. If a set of digital objects, such as, for example, a journal, including it’s entire back-run, is sold to another publisher, searching for the articles in the selling publisher’s digital archive will no longer work, but pointers to the article’s DOI will still work. How? Because DOIs are “resolved” by DOI resolution services. There, the DOI is associated with a location, and that location can be changed as needed. So, when that journal, or reference book, or table, or sidebar, or map, or other digital object changes location or owner, the new location or owner is registered with the DOI resolution authority and links to the object continue to work.
So, what does this mean? It means that when a user “clicks” on a reference to an object that is identified with a DOI, completely invisible to the user, a query is sent to a DOI resolver which returns a location for the object, which — assuming that a lot of other things, mostly related to money are in place — takes the reader to the object. My brother likes this scheme and understands that this is a far better idea than relying on the URLs of these things — probably because he knows that some of the files of piles of URLs he has collected from time to time become obsolete very quickly.
So, the next time he was out hunting for a bunch of papers he wanted to read (I imagine him in him boots, wading through a swamp with a butterfly net.) instead of copying out the whole citation, he copied the DOI of every article he wanted to read. And then he gave the list of DOIs to a associate, with instructions to go get all of these articles and print them for easy reading. But the colleague couldn’t figure out how. So, my brother calls his friendly neighborhood science librarian who has helped him find things in the past. And HE doesn’t know how to use these DOIs.
So, the email. “Is DOI DOA?” (Dead on Arrival?) No, just being used in an unexpected way. It didn’t occur to me that my brother would want to surface the DOIs like that. And, more important, that isn’t really the way the system was set up for them to be used. As it happens, I pointed him to a DOI resolver with a web interface, and he got what he needed by pasting all of those DOIs into it one by one. But he was pretty unimpressed by this technology; it may not be DOA, but it certainly isn’t friendly. DOIs, he concluded, are pretty much worthless.
Well, I know why he thinks that. But he’s wrong. DOIs, and the whole DOI registration and resolution scheme, are not worthless. This is, however, a technology that should be kept subterranean. He should, in my opinion, never have seen it at all. And if he had not seen it, he would have continued to use it, happily albeit unconsciously. DOIs would have been things the publishers of journals (and other publishers) used, but readers never saw. Not unlike a lot of infrastructure — important, sometimes sophisticated, and completely invisible to the normal user.
It seems to me that XML may be heading underground. Or perhaps, that XML should be heading underground. I don’t see the “average computer user” showing any more interest in seeing the markup layer than they used to. I don’t see the masses of web developers, or of people who create web pages (which isn’t the same group at all in my mind), heading toward creating their HTML as XHTML, much less dealing with the markup explicitly. I don’t see people clambering to do word processing or spreadsheets or presentations with explicit markup, or even caring what markup is behind the scenes of the GUI-based packages they use. So, while I think my little brother is onto something, it isn’t that XML is over; it’s that XML is hiding from him more and more successfully.
The interesting issues in the world of markup (which these days is pretty much the world of XML) are the sorts of things we will be talking about here this week, but aren’t likely to make it into the newspapers. And that’s fine.
By the way, did you notice that I just said that markup and XML are pretty much the same thing? I have mixed feelings about that. First, because not only do I have clients who still use SGML, I am not ashamed to say so. But, more important, I don’t think XML is the markup specification to end all markup specifications. I think that it is entirely possible — no, I think it is inevitable — that there will be a successor to XML. And I think it is well within the realm of the possible that some of the things we will talk about here at Extreme 2007 will influence the shape of that specification.
Yesterday was the Workshop on Markup of Overlapping Structures — by the way, for those of you who weren’t there, it was a serious Geek-fest. Lots of hands flying through the air, and people lining up at the microphone to discuss what they had just heard, and contrasting points of view. With luck we will have a lot more of that lively respectful discussion throughout the conference.
I bring yesterday’s discussion up partly, I guess, because it was a lot of fun. But more, because of something Henry Thompson said. Henry said (approximately), “The use of XML to mark up human-authored text is not what built the XML industry — that’s machine-authored XML that no human being interacts with in any way. The XML for data mining and web services.” He went on to make a point about overlap, but I would like to consider his statement in a larger context — or perhaps simply to quote him out of context. I think Henry put his finger on something very important, and perhaps he pointed to the reason behind my little brother’s dismissal of XML as an interesting area for discussion. If, as many people do, you consider the majority of the users of something to be what it is, and should be, for, XML is for some applications I consider really, really, boring. And, in that same light, the topics we are discussing at Extreme are, by and large, unimportant. The vast majority of XML users are simply not interested in most of what we will discuss here. And they shouldn’t be.
Liam Quin told me, yesterday, that he has been hearing, from a variety of people, what XML is used for. He has heard, for example, that 95% of the XML in the world is used for web services, with the obvious implication that XML should be simplified and stripped of all the clutter that is not used in web services. Similarly, he has heard, 95% of the XML on the web is RSS, and XML should be simplified … You get the idea. And it seems to me that, first of all, if anyone can believe that something they do with XML is what 95% of all XML users are doing, they live in a very small world. But, more important, they live in a world in which they are doing a relatively simple task, repeatedly, and with little need to stretch. These people are manufacturing XML widgets, or perhaps they are using XML in manufacturing plastic widgets. While I will argue that there simply can’t be four or five different uses for XML that are each 95% of the use of XML, I also maintain that it doesn’t matter!
I don’t care that many people are doing something that is of no interest to me. I am interested in the things that interest me, and it makes no difference if there are huge numbers of people interested in something else. I want to use XML in environments with human-created content and, in many cases, human-created markup. I am interested in the problems that arise because human-created documents are not regular. Human-created documents are, in some significant ways, lumpy, undisciplined, and difficult to work with. And in order to work with them I need capabilities and tools that people working with other sorts of content don’t need.
There are people, mostly in the marketing professions but some also generic managers (you know the type: have briefcase, can manage … anything). Anyway, there are people who believe that if they can’t sell their product to the largest possible market, it will fail. They don’t want to sell an XML editor to the (I’m making these numbers up from whole cloth — don’t trust them!) thousands of people who might use it to do literary analysis or historical annotation. They want to sell their product to the teeming masses; to the people who run telephone help lines — or telemarketing services. They don’t have the notion of “big enough to make enough money for me”; they have to be the biggest. Period. These are the people who announce that 95% of XML is this, that, or the other, and by that, they mean that capabilities needed only by the other 5% are simply clutter and should be removed.
Most of what we are going to talk about here at Extreme 2007 is relevant to human-authored XML. It is about text, not about engine performance data. It is about financial records and medical information and literary analysis and technical documentation. It is about messy, lumpy, human content. And how we work with that stuff, the problems it presents, and the opportunities for messing with it — ummm … I mean managing and massaging it. That means that most of what we talk about at Extreme is “minority stuff”. A lot of it is “edge-cases”. And the marketers who want to sell a computer and some XML tools to the hoi polloi will not be interested in what we discuss here, because most of what we are talking about is edge cases.
Further, I think XML will be going further and further underground. I think, from the end user’s point of view, XML should be, and will be, vanishing. I don’t think people like my brothers will see, or even be aware of, the XML that will soon underlie their word processors and DVD players. There will be invisible XML everywhere: medical records, banking, legislation, and at the grocery store. We will not only get far better at designing, creating, and manipulating XML; we as a community (Or is that as an industry?) will get far better at hiding the XML. DOIs will be invisible, as will most XML for most people.
As we get ready to start the real Extreme conference, I’d like you to listen to something a few of you may be familiar with. Please pay attention. This little exercise for the imaginations is intended to help us all limber up our imaginations — which we are going to use a lot at Extreme. [Audio clip played]
That, for those of you who don’t know it, was Stanley Freeburg, advertising advertising on the radio. Now, in the 1960s when he recorded this, it couldn’t have been done on television. Now, in 2007, we can see people scratching dinosaurs behind the ear on television — and the dino leaning in to the hand to appreciate the scratch; we could certainly see Lake Michigan filled with hot chocolate. But even now, that would be big-budget TV. Freeburg was making low-budget radio, and it could still be done that way. A lot of what we will hear at Extreme will be in the “Stan Freeburg” genre. No, I don’t mean that it will be funny, although I wouldn’t mind a little leavening from time to time. I mean that it will be in our minds or it won’t be at all. You are going to have to work as you listen to many of presentations at Extreme.
You’ll need the imagination you needed to “see” Lake Michigan filled with hot chocolate to get the most out of Extreme.
As you listen to the various talks here at Extreme, remember to listen creatively. Think about where these ideas could go. Remember that many of these talks are about work in progress, and many of the speakers are researchers or markup geeks, not radio or television personalities. Listen to their messages, even when the delivery is, perhaps, a bit uneven and their ideas are, perhaps, incompletely articulated. Expect that some of the speakers will tell you things you didn’t know and some things you may have a hard time believing. I urge you to give them the benefit of the doubt at least long enough to be sure you understand what they are saying. As they say in theater circles, suspend your disbelief for a little while. Then, and only then, after you have given the speaker a sympathetic ear, engage all of your critical facilities. Ask questions and expect answers. But allow the answers to be surprising.
Enjoy Extreme Markup Languages 2007.