XML :
Updated: 7/24/2003; 8:54:22 PM.

 

xml

Subscribe to "XML" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.

 
 

Tuesday, March 18, 2003

Secrets of the XML gods

text, code, data
It's the season for confession. First Tim Bray reveals a dirty secret: "a lot of input data these days is XML...in most cases, I use the perl regexp engine to read and process it." Then Sean McGrath fesses up to his Python habit: "I know I should be invoking a WF [well-formed] parser on the content.xml string but gee Ma, I've got work to do."

Text is code, code is data, data is text. Around and around we go. If the XML gods are resorting to Perl and Python hackery to shred documents, are we just spinning our wheels? I don't think so. But this is, perhaps, an unusual case. Normally, as we climb the ladder of abstraction, we are happy to lose sight of the rungs below. I cannot usefully manipulate the blocks and sectors of my disk, or the assembly code my software compiles down to. I can, however, make excellent use of the text stream underlying XML abstractions. So, which way to regard a document becomes a kind of Necker cube puzzle. The bad news: it's confusing. The good news: it's useful.


10:06:19 AM    

Monday, March 17, 2003

Querying XML in databases

Tom Dyson points out that XPath support for PostgreSQL is in the works. Here's the example he gives:

SELECT
  article_id, xpath_string(article_xml,'/beatles/@beatle_id') AS beatle_id;
FROM
  t_articles
WHERE
  xpath_bool(article_xml,'/beatles/beatle[@alive="yes"]');

Meanwhile, over at John Merrells' weblog, you can try out SleepyCat XML DB -- more straightforwardly, in fact, than if you go in through the front door. I couldn't quite get it working on OS X, next time I'll go the Linux route.

Even as the XML query language debate [via Collaxa's Take] continues, it's clear that XPath is the foundation for querying XML documents stored in databases. Here's another example, from the Berkeley DB XML FAQ:

void DatingService::addProfile
  (const std::string &username, DbXml::XmlDocument &profile)
{
  XmlResults results
   (profiles_.queryWithXPath(0,"/client[@username='"+username+"']"));
  if(results.size()==0)
  {
   profiles_.putDocument(0,profile);
  }
}

What's still unclear is how declarative updates will work. There are lots of approaches: updategrams, XUpdate, XQuery update. SQL wasn't built in a day, so it shouldn't be surprising that this stuff is taking a long time to cook.


10:05:37 AM    

Tuesday, February 11, 2003

Feeling Chad's pain

In his column this week, Chad Dickerson fesses up to the dirty secret of XML content management. The blurb reads: "XML isn't a panacea, especially if the semantic integrity of data hasn't been maintained properly." You can say that again. Chad, I feel your pain! I've been involved in electronic publishing of one form or another for almost 20 years. Nobody ever admits how hard it is to push the boulder up the hill of entropy, and how hard it is to keep it from rolling back down the other side.

Five years ago I wrote my book using what would soon be called XHTML. I had a DTD, and scripts to validate the stuff and transform it into various deliverables. If I had to do the same today, I'd probably do it roughly the same way: edit in emacs, transform using an expat-bound scripting language -- then Perl, now more likely Python. The better tools that were right around the corner are still...right around the corner.

To be sure, there are richer XML script-language bindings nowadays. Next time I'm in need of such a thing, I'll give Python's minidom a try. It's a lightweight DOM that can be used in elegant ways, as Mark Pilgrim has shown. There have been rumors of an X# from Microsoft, and Adam Bosworth continues to evangelize intrinsic programming-language support for XML.

These are ideas whose time has come. And not a moment too soon. Chad writes:

At InfoWorld, we started our data migration project with high hopes, approaching our mother lode of XML data with the tools that any self-respecting 21 st century developer would use: Java and XSL. It was all in XML -- how could we lose? In the end, we shuffled away from the XML scrap heap with heavy hearts and a mountain of one-off Perl scripts that got the data migration job done. We prevailed, but ultimately it was what you hear some football coaches call winning ugly.

You're being too hard on yourself, Chad. You didn't have the luxury of single-handed control of the archive. People had to evolve it, and there weren't -- and aren't yet -- tools sufficiently deployable and usable to enable the necessary delegation of control. That's exactly why I'm jazzed about Office 11's XML support, and the forthcoming XDocs. But these won't be slam-dunks either. It's going to take a really long time for all this stuff to get cooked.

And you know what? Even then, migrations will require elbow grease. As a longtime content wrangler, I'm guilty of assuming that any structure is mappable to any other structure. And that's true. But it's never as trivial as we like to imagine. Transformation is work. We'll always need to do some of it programmatically. The good news is that XSLT, which I've made my peace with and learned to use somewhat productively, was only a first draft of the kind of program-language bindings that are in the pipeline. Things will improve. There will always be times when we need to "win ugly." Over time, though, it'll get less ugly.


12:37:45 PM    

Monday, November 18, 2002

XML for the rest of us
We've known for many years that most of our vital information lives in documents, not databases. XML was supposed to help us capture the implicit structure of ordinary business documents (memos, expense reports) and make it explicit. Sets of such documents would then form a kind of virtual database. The cost to search, correlate, and recombine the XML-ized data would fall dramatically, and its value would soar. It was a great idea, but until the tools used to create memos and expense reports became deeply XML-aware, it was stillborn. XML did, of course, thrive in another and equally important way. It became the exchange format of enterprise databases and the lingua franca of Web services. Now Office 11 wants to erase the differences between XML documents written and read by people using desktop applications, and XML documents produced and consumed by databases and Web services. This is a really big deal. [Full story at InfoWorld.com.]
7:15:41 AM    

Friday, November 15, 2002

A conversation with Jean Paoli

Next week's issue of InfoWorld includes an article on the new XML capabilities of Office 11. While researching the story, I interviewed the architect of XML in Office 11, Microsoft's Jean Paoli, one of the primary co-creators of XML. Here are some of his remarks. [Full story at InfoWorld.com]

Note: That article is my weekly column,  and is a companion to another story that should go live later today. Phil Wainewright posted his thoughts on the column:

It is about time we had users defining XML schema rather than having to rely on an expert going away and modeling it (inevitably an imperfect procedure). But the experience of user-defined document templates in Word and the difficulties of propagating and then maintaining a corporate standard even at this elementary presentation level of document structure doesn't augur so well for the rapid creation of consistent, sharable XML DTDs and schema across the enterprise. Putting XML directly into the hands of the users is an important and necessary step, but it is only the beginning of the journey. [Loosely Coupled]

Excellent points. It's a long road; this first step was way overdue; it creates a thicket of issues around usability and standards (e.g., the open-yet-proprietary WordML). But as Phil says, it's about time. Let's roll up our sleeves and get to work on this stuff.


12:50:39 PM    

Tuesday, October 29, 2002

The Xopus in-browser XML editor

Regarding my article this week in InfoWorld, which says:

Structured editing of schema-controlled XML data is a hard challenge to meet. Tools that would make the task easy and natural are nowhere in sight. [InfoWorld.com]

Sjoerd Visscher notes:

Maybe Jon Udell still hasn't seen Xopus. I can't blame him. We (Q42) don't have a marketing department, and our Xopus site looks like we don't want you to use it. But Xopus does seem to be what Jon Udell is looking for.

I guess I wasn't paying attention. The Xopus demo is, indeed, an eye-opener. Runs in the browser, without plug-in support, toggling between WSYIWYG and XML modes, enforces schema, has multilingual support both in the UI and the document. Includes a competent table editor. The developers of this open-source project have even built a prototype of the MSIE ContentEditable feature for Mozilla, in advance of official support in Mozilla for that feature. Impressive!


9:41:37 AM    

Friday, September 20, 2002

XML scripting

Here's a nifty XML scripting idea, by way of Collaxa's Blog (thanks, Edwin):

Several companies have been collaborating to create a technology we call native XML scripting to do just that. BEA is the first to include native XML scripting in their products, but you will be seeing a lot more of it. In fact, ECMA plans to include it in future versions of the popular ECMAScript language (a.k.a. JavaScript). BEA Dev2Dev

Cool. Six months ago, I noted that Adam Bosworth was thinking about this idea:

We need a language that can natively support XML as a data type and yet can gracefully integrate with the world of objects (Java or otherwise) and can take advantage of the self-describing nature of XML by supporting querying of its own variables. This language as used by humans will look like a programming language, not an XML grammar. This is the language we will use to convert from one XML format to another. This is the language we will use to synthesize complex XML documents for multiple sources and Web services. This is the language we will use to mediate between the world of XML messages and the world of Java or C# processes. [XML Magazine]

I guess he wasn't just idly speculating!


3:35:18 PM    

© Copyright 2003 Jon Udell.



Click here to visit the Radio UserLand website.

 


July 2003
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Mar   Aug