|
A parable about data provenance
Earlier this week Lorcan
Dempsey, who is VP and chief strategist with the Online Computer
Library Center (OCLC), blogged about
an enhancement to an OCLC service that searches the Library of
Congress Name Authority File. The new version uses fuzzy matching,
which means that the common misspelling of my name as 'John' will find
me as well as
my alter ego Jon G. Udell.
This reminded me that for years, in various online venues, I've
seen my book, Practical Internet Groupware, attributed to Jon
G. Udell, author of The economics of the American
newspaper. It turns out that's because the authoritative record at
the Library of Congress has had it wrong all this time. Lorcan kindly
referred the matter to an OCLC colleague who made the correction and
reported it to the LC. So at some point my book as seen in WorldCat will be
correctly attributed, and eventually that change should propagate to
the libraries that subscribe to WorldCat.
How in general can authors resolve such problems? The OCLC advises:
We get lots of comments from authors via the Comments button on
FirstSearch record displays as well as through the general
oclc@oclc.org email address. In addition, the general
Contacts page
on the OCLC web site contains
links to forms that can be used to request changes to bibliographic and
authority records.
The Library of Congress gets similar comments via a
feature on the record displays in their online catalog that allows users
to submit an Error Report Form. It's kind of hidden at the very
bottom of the display.
One caution, since catalogers work from title pages and other
information in the material being cataloged, we often have to ask for
proof before making a change. Proof may be a faxed copy of the
title-page or its verso, etc.
We are often the best authorities for
information about ourselves, and we often encounter errors that we
could easily fix. Why don't we? Because the connection between the
authoritative source of a fact and its erroneous manifestation is
rarely explicit.
Given that the infosphere is becoming a web of syndicated facts, we'll want
to make those connections explicit. As a best practice, data provenance
should be accessible at the point of display and use.
|