Free Newsletters
Technology & Business Daily

InfoWorld
Log-in | Register

  Thursday, June 23, 2005 

Collaborative filtering with del.icio.us

There are currently 6550 del.icio.us folk with whom I share common bookmarks. As nobody will be surprised to see, my link affinity with that population displays the now-familiar long tail:
del.icio.us affinity
There's a recommendation engine lurking in there somewhere, and I've decided to try to flush it out. The prototype is a two-stroke engine. First, it captures the set of del.icio.us users on the steep part of the curve -- the ones with whom I have the most link affinity. Then it reads all their RSS feeds, coalesces the links, and applies another filter to select just the links above a threshold of commonality.

I'm still playing with the two thresholds -- personal affinity and link commonality -- but the first cut at a synthesized recommendation feed looks like a promising way to identify an implicit community of interest and tap into its emergent group mind.

Once things settle down I'll publish the code, but meanwhile here's another kind of recommendation for you. Greg Wilson (disclosure: friend of mine) has extended the Pragmatic Bookshelf with a wonderfully pragmatic volume entitled Data Crunching. I'd read it a while ago, but I thought of it again while working on this new del.icio.us hack.

My recommender uses a mixture of shell scripting, Python scripting, regular-expression pattern matching, and XML parsing. Greg's book gently introduces these techniques as well as others: XSLT transformation, packing and unpacking binary data, basic SQL. When, why, and how to combine these methods is something we don't teach often enough or well enough. We've needed a book like this forever; I'm delighted that it has finally arrived.

 


Recent Entries


















































Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist