Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

  Monday, January 19, 2004 

XPath query tips

My new query page invites you to try writing your own queries, and a few adventurous souls have been doing just that. As I've mentioned before, I'm no world-class expert on this subject, but as I build up a corpus of searchable data on the one hand, and a set of canned and modifiable queries on the other, I'm learning. Indeed, one of my goals for the query page is to serve as a tutorial and playground, a place where folks (me included) can get ideas about what kinds of XHTML elements they might include in their own content, and how those elements could interact with XPath queries.

In the spirit of exploration and learning, here's a first installment of the tutorial. First, some background. The XPath expressions used in this search engine are embedded in an XSLT stylesheet. The stylesheet includes two XSLT templates. Here's the one that counts the number of results:

<xsl:template match="/">
<div>Results:
<xsl:value-of select="count(__QUERY__)"/>
</div>
<xsl:apply-templates />
<br clear="all"/>
<p>Entries searched: <xsl:value-of 
       select="count(//item)" /></p>
<p>Date of oldest entry searched: <xsl:value-of 
       select="//item[position()=last()]/date" /></p>
<p>Date of newest entry searched: <xsl:value-of 
       select="//item[position()=1]/date" /></p>
</xsl:template>

And here's the one that reduces the whole XML file to just matching elements:

<xsl:template match="__QUERY__" >
<p><b>
<a>
<xsl:attribute name="href">
http://weblog.infoworld.com/udell/<xsl:value-of 
    select="ancestor::item/date" />.html#<xsl:value-of 
    select="ancestor::item/@num"/>
</xsl:attribute>
<xsl:value-of select="ancestor::item/title" />
</a> (<xsl:value-of select="ancestor::item/date" />)
</b>
<div>
<xsl:copy-of select="."/>
<xsl:if test="local-name(.)='blockquote' and @cite != ''">
Source: <xsl:value-of select="@cite"/>
</xsl:if>
</div>
<hr align="left" width="20%" />
</p>
</xsl:template>

In my forthcoming O'Reilly Network column I publish the script that implements the search engine, and discuss it in detail. But from the perspective of writing queries, here's what you need to know. First, the search script replaces __QUERY__, in both XSLT templates, with the text of an XPath pattern -- either a canned one, or one that you supply. Second, the XML file matched against the pattern has this simple structure:

<item num="a883">
<title>Server-based XPath search</title>
<date>2004/01/10</date>
<body>
<p>
...arbitrary XHTML content...
</p>
</body>
</item>

Third, the pattern is used, in the XSLT transformation, in two different ways. The counting template uses it in an XSLT select (<xsl:value-of select="count(__QUERY__)"/>), but the data-reduction template uses it in an XSLT match (<xsl:template match="__QUERY__" >).

When I first wrote this entry, I used the term expression rather than pattern -- but really, the latter is correct. What's the difference between the two? Writing for MSDN Magazine, Aaron Skonnard explains:

Select does indeed expect an XPath expression, which is used to select a nodeset for further processing.
...
The match attribute, on the other hand, takes what's called a pattern. A pattern looks like an XPath expression because it shares the same syntax, but it's treated differently by the XSLT processor. A pattern is used for matching nodes in the tree against the specified criteria. [MSDN Magazine]

The XPath syntax you can use in a match pattern is more restrictive than the syntax you can use in a select expression. Since my XSLT stylesheet uses the syntax you supply in both contexts, it is limited to the more restrictive flavor -- that is, it must be a pattern, not a full-blown expression.

Watching my search logs, I notice that the most common error is to supply something like this:

count(//blockquote)
This fails because only some XPath functions can appear in the pattern syntax, and count() isn't one of them.

Why restrict the XPath syntax to only what's valid for the match attribute of an XSLT template? Because that's what my little search engine does. It matches and displays a subset of the elements contained in my blog.

 

What RSS users want: consistent one-click subscription

Saturday's Scripting News asked an important question: What do users want from RSS? The context of the question is the upcoming RSS Winterfest. Dave Winer adds:

I thought we should try to put the focus on people who use the technology, to let them set the agenda for the developers.
Amen. Over the weekend I received a draft of the RSS Winterfest agenda along with a request for feedback. Here's mine: focus on users. In an October posting from BloggerCon I present video testimony from several of them who make it painfully clear that the most basic publishing and subscribing tasks aren't yet nearly simple enough.

Here's more testimony from the comments attached to Dave's posting:

One message: MAKE IT SIMPLE. I've given up on trying to get RSS. My latest attempt was with Friendster: I pasted in the "coffee cup" and ended up with string of text in my sidebar. I was lost and gave up. I'm fed up with trying to get RSS. I don't want to understand RSS. I'm not interested in learning it. I just want ONE button to press that gives me RSS. Like Technorati gives me a simple list. I don't want to look under it's hood to learn the mechanics of how it works. RSS sales folk speak a different language from customers. RSS designers/instruction writers need work in tandem with people who write plain English and talk the language of the customer. [Ingrid Jones]
Like others, I'd say one-click subscription is a must-have. Not only does this make it easier for users, it makes it easier to sell RSS to web site owners as a replacement/enhancement for email newsletters. Managing newsletters is a huge PITA - spam filters and the general unreliability of SMTP for large scale broadcasts has led to a Rube Goldberg nightmare. (NOTE: I'm not talking about spam, so no flames please. I'm talking about opt-in.) There are a LOT of companies that would jump on RSS if enough end users adopted it and it did away with the need for cumbersome email delivery technologies. [Derek Scruggs]
For average users RSS is just too cumbersome. What is needed to make is simpler to subscribe is something analog to the mailto tag. The user would just click on the XML or RSS icon, the RSS reader would pop up and would ask the user if he wants to add this feed to his subscription list. A simple click on OK would add the feed and the reader would confirm it and quit. The user would be back on the web site right where he was before. [Christoph Jaggi]

Clearly the current approach -- linking the orange XML icon to an XML file, whose address must be captured and pasted into a feedreader -- isn't working for many users (or would-be users). There has been lots of discussion about creating a standard one-click subscription method. Dare Obasanjo reviews some of the issues here. Phil Ringnalda reviews some current solutions here. On purely technical grounds, I'm frankly not sure which of three approaches -- a feed:// URI scheme, a MIME type, or a local HTTP listener -- is the "right" one. Dare writes:

With all these varying approaches, it means that any website that wants to provide a link that allows one click subscription to an RSS feed needs to support almost a dozen different techniques and thus create a dozen different hyperlinks on their site. This isn't an exaggeration, this is exactly what Feedster does when one wants to subscribe to the results of a search. If memory serves correctly, Feedster uses the QuickSub javascript module to present these dozen links in a drop down list. [Dare Obasanjo]
I checked, and that's exactly what Feedster is doing. Yes, it's preposterous. Nevertheless, I've decided to try the same method myself until the market converges on a single approach. Which convergence, it seems to me, can't happen until users of feedreaders reach some critical mass. Which, in turn, won't happen if feed publishers and feed readers continue to violate users' expectations of how something as fundamental as subscribing to a feed should work.

 


Recent Entries


















































Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist