Free Newsletters

  
Reality Check | Ephraim Schwartz » Search fatigue

April 10, 2007 | Comments: (0)

Search fatigue

Improving search has less to do with goosing numbers than honing the relevancy of results

A year ago March, Neil Holloway, president of Microsoft's Europe, Middle East, and Africa operations, boasted about how good Microsoft's search engine would become in six months.

"What we're saying is that in six months' time we'll be more relevant in the U.S. market place than Google. The quality of our search and the relevance of our search from a solution perspective to the consumer will be more relevant," Holloway said.

This got him into a lot of hot water, at least in the blogosphere where he was roundly lambasted for making the claim. He went on John Battelle's Searchblog to explain what he really meant to say.

"[W]e are committed to investing in R&D aimed at providing a search service, initially in the US in six months, that performs better than the current industry wide standard of one in two urls being connected to the subject of the original query. I also said that our aim is to perform as good, or better, in that respect than Google. This is a long term goal. I did not put a date to it as this is work in progress," he posted.

Yes, you did put a date to it, Mr. Holloway. See your quote directly above.

Nevertheless, a year later I decided to perform a search or two on Google, MSN Search, and Yahoo to see what kind of results I would get.

First, I searched for the "Gospel of Judas," a recent archeological discovery of a lost Gospel written about A.D. 185.

Google gave me 1,390,000 returns; Yahoo, 1,800,000; and MSN Search, less than 500,000.

That's quantity, but what about the quality -- or relevancy, as it is called -- of the search results?

This time I searched on the phrase "search fatigue" (this time, quotes included), which is only now becoming recognized as a problem, mostly by reference librarians who see it every day, in online searching. More about that later.

Here, Google returned 1,890 results; Yahoo, 431; MSN, 329.

Only Google returned multiple results on the phenomenon of search fatigue as it refers to online searches on the first page. MSN and Yahoo had only a single relevant result on the second page, nothing on the first.

As you can see, one year later Microsoft loses out to Google both on quantity and quality of results. Why the largest software company in the world can't do better is beyond my ken.

However, the truth is that all of the current consumer search engines are inadequate. Only the most dedicated researcher would wade through 431 results, let alone a million or more. The rest of us will definitely suffer from search fatigue.

Is there a cure?

I suggest that you get a copy of the March 2007 print edition of American Libraries magazine and read Jeffrey Beall's article, "Search Fatigue: Finding a Cure for the Database Blues." Beall is a catalog librarian and assistant professor at the Auraria Library, University of Colorado at Denver and Health Sciences Center. Beall offers some search-fatigue antibiotics.

Search fatigue, according to Beall, is a feeling of dissatisfaction when search results do not return the desired information.

"The root cause of search fatigue," Beall told me, “is a lack of rich metadata and a system that can exploit the metadata."

For example, metadata-enabled searching, as you find in a library when it searches through its databases for resources, "allows for precise author, title, and subject searches," Beall says. In other words, it looks only in the fields you request, rather than searching through the entire document. If you name the author, it looks only in the author field of each document, thus returning only relevant hits.

Beall says libraries also use "controlled" vocabularies, meaning that if someone searches for information on cannibalism, they will also receive resources that use the synonym "anthropophagy."

After my talk with Beall, my advice to Microsoft and Holloway is to go to the folks sitting behind the library reference desk. They can probably tell you how to create a better search engine than Google more than anyone else on the planet.

Posted by Ephraim Schwartz on April 10, 2007 03:00 AM


RATE THIS ARTICLE:





 

  •  
  • COMMENTS




Metadata can be very effective in the centrally-managed, closed, trusted environment where librarians, archivists, and taxonomists work, but it's a bit trickier outside the firewall on the public web — if search engines trusted metadata from site authors the first 1,000 hits for "search fatigue" or "gospel of judas" would all be porn and scam sites.

Posted by: David Megginson at April 10, 2007 11:36 AM

Dear Ephraim
I agree with the gist of your article and I also believe that Google is the best search engine and MSN has a long way to go. But I have to say that the analysis you reported to back your argument about quality and quantity of search results, is not scientific. The analysis is incomplete and nothing can be said from it.
Best

Posted by: Rohit Aggarwal at April 10, 2007 11:40 AM

I'm a Google user and my sense is that its still the bettery SE by far, I take issue with your methodology. How does the number of results for "search fatique" measure relevancy?

Posted by: pablo at April 10, 2007 12:05 PM

To Pablo's comment--
I measured the relevancy of the search engines by seeing that Google returned two accurate hits on the first result page, while both MSN Search and Yahoo Search had nothing useful on the first page and one accurate response on the second results page. Isn't that a legitimate way to gauge relevancy?

Ephraim at InfoWorld

Posted by: Ephraim Schwartz at April 10, 2007 02:45 PM

Both of Mr. Beall's comments are more relevant to intranets (closed environments, as David Megginson accurately described them) than to the Web.

First: on the Web, metadata is what Semantic Web is all about. Although Semantic Web is a powerful concept, it may take a while - if ever - before it becomes relevant to an average user.

Second: controlled vocabularies have been a standard feature of full-text search engines for a long time. There is no need for quotes; controlled vocabulary is a well-known expression in the information management science.

Internet search engines in general are far more advanced than full-text search engines used on intranets, which are the source of librarians' experience. Anyway, most of the designers of the first wave search engines from the late 80's and early 90's (those are technologies used in libraries) already work at Google, Microsoft, or Yahoo.

The reason Microsoft can't get it is simple - this is not a cookie-cutter work.

Posted by: zack lukic at April 10, 2007 05:02 PM

There is a cure.. Cognition.com and CognitionSearch.com (the first real Meaning-based Linguistic Search). 20yrs in the making with a combined 300 person years of development. See article from Barbara Quint below. I am not just trying to get a plug here, I am a true believer!

NewsBreaks for Monday, April 2, 2007 -- Cognition Launches New Linguistic Search Engine
by Barbara Quint

Every searcher's fear is that a search will produce too little of what you want or too much of what you don't want. And, even if you get a nice collection of the right stuff, is it all the right stuff out there or does it omit things you need to see? In technical terms, does your search strategy balance precision and recall effectively?
--> http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=35805 and the image http://newsbreaks.infotoday.com/images/?imageid=901


Cognition Launches New Linguistic Search Engine
by Barbara Quint
Posted On April 2, 2007


--------------------------------------------------------------------------------


Click here for full-size image

Every searcher's fear is that a search will produce too little of what you want or too much of what you don't want. And, even if you get a nice collection of the right stuff, is it all the right stuff out there or does it omit things you need to see? In technical terms, does your search strategy balance precision and recall effectively? Linguistic and semantic search engines have long held out the promise of helping computers "understand" concepts, rather than just search for terms. Cognition Technologies (www.cognition.com) has launched CognitionSearch, a linguistic search engine that supports ontology, morphology, and synonymy, tapping one of the world's largest computational dictionaries. Initially, the company will market a vertical enterprise service for legal litigation support and for life science and health research. It also offers an open Web service (www.cognitionsearch.com) to demonstrate the technology as applied to MEDLINE and PubMed content, to judicial and legislative sources, and to political blog content.

The complex technology behind CognitionSearch stems from some 20 years of research led by Cognition's chief technology officer, Dr. Kathleen Dahlgren, who also serves as an adjunct professor of linguistics at the University of California-Los Angeles. After leaving IBM in 1990, Dahlgren set up a new company called Intelligent Text Processing (ITP) that patented a natural language understanding system. ITP struggled and closed in 2002. In February 2003, Dahlgren and a team of linguists and lexicographers founded Cognition Technologies. In January 2006, Scott Jarus, former chief executive of j2 Global Communications, the company that produced the eFax service, joined the company as both an investor and CEO. Jarus hopes to supply the business and strategic knowledge needed to leverage the technology into the next big thing in the search arena.

Market Strategy
Cognition's two-pronged vertical-market strategy focuses on legal litigation support, particularly in the electronic data-discovery process, and on the life science industry, particularly the pharmaceutical and genetics fields. According to Jarus, the company chose the target areas "based on the realities of finances and marketing opportunities. We needed to find self-contained markets where research lay within boundaries and where we wouldn't have to spend a huge amount of money to penetrate and gain traction even for proof of concept. Launching a search engine on the Web involves an inordinate amount of money that we didn't have. So we looked to vertical markets as a proving ground." The product areas were chosen for their critical, unmet needs or "pain points," to quote Jarus, and the opportunity to meet a lucrative business need.

In the legal arena, Cognition has become the advanced search engine for the LexisNexis Concordance service, a litigation case-management software service used on more than 65,000 desktops. The CognitionSearch-powered Concordance is integrated through Everest Technologies' enhancement (www.everesttek.com) and built into IPRO Tech's IPRO View litigation support service (www.iprocorp.com). In the life science area, the first enterprise target is a large university medical school with whom the company plans to further augment its life science-specific terminology, including those driven by the Human Genome project.

How It Works
Cognition's patented technology combines formal linguistic algorithms with semantic representations to create a "naïve" semantics that speeds up the computational parsing. In building the tools for the service, Cognition uses 4 million semantic representations, 350,000 word stems, 376,000 word senses or concepts, 17,000 ambiguous word definitions, 100,000 phrases, 7,000 nodes for the ontology or tree structure of the taxonomy, and 50,000 thesaural concept groups. Enterprises employing the software are provided with tools to add their own specialized terminology, e.g., product name lists. In the case of very large term expansions, Cognition will supply a consulting service to augment the CognitionSearch service.

In the current launch of the CognitionSearch open Web service, the company selected three subject areas to showcase and demonstrate the technology: health (MEDLINE, PubMed, etc.), legal (U.S. Supreme Court cases, a million Enron emails, etc.), and politics (key political blogs). A glance at the home page clearly indicates plans to add content for government, the environment, and social networks. In the future, the company is also considering the introduction of a consumer-oriented search portal called CogHog. (The CogHog, an icon using a blue pig's head appearing on the CognitionSearch site, is known to Cognition staff as Phil Cogito. The Phil may actually be a gilt—i.e., a female pig with only one litter of piglets to her credit—since she appears to have returned to her maiden name after divorcing "Ergo Sum.")

The Advanced Search mode for CognitionSearch offers five basic search approaches: plain English search, linguistic Boolean search, quoted (or phrase) search, pattern search, and fuzzy search (a variation of the pattern search). The Advanced Search mode will seem familiar to professional searchers with a lot of experience dealing with database services. The complexity and field searching approaches may seem new and somewhat difficult to end users, however.

A Work in Progress
At this point, the CognitionSearch engine is clearly a beta test, a work in progress. Throughout a search, users can access Feedback mechanisms to comment on the service and to report their reactions. They can also use a pull-down menu to push the search into a different conceptual path if the terms chosen by the search engine stray off target. If you activate the Feedback feature at the end of a search, the pop-up window will repeat your search query and ask you to designate the specific results that elicited your reactions. At this point, one clear difficulty lies in the lack of an option to display results in a reverse chronological order for those interested in the most recent work in a field. Relevance ranking is the only display mode offered, though sophisticated users could conduct a series of Advanced Searches in some files using the date field. Mike Reid, vice president of sales and business development, indicated that they are aware of this problem and have it high on their list of changes.

Funding for the new service is coming from a Southern California venture capital group called the Tech Coast Angels, a group that Jarus has joined. At this point, Jarus is leading a venture capital round looking for $5 million to $10 million in investment funding.

I asked Steve Arnold, longtime expert observer of the search engine field (among other information industry concerns), what key factors would affect Cognition's chance of success. Technologically, as Arnold confirmed, the greatest problem linguistic or semantic search engines have had has always been scalability. The greater the volume of content, the slower the engines work until they just can't meet user expectations for real-time service. According to Dahlgren, "Our technology is incredibly scaleable and fast. We were able to index the whole 16 million records in MEDLINE in one day." According to Jarus, "The technology doesn't have a problem with scaling. The only challenge is when we have performance problems; then we need to add more machines, more hardware. It's a commoditized problem."

As for business factors, Arnold named three issues: funding, especially government contracts; penetration of vertical market, especially the problem of countering the resistance of IT staff to solutions from strange, new vendors; and a solid exit strategy, e.g., licensing the technology or selling out the operation. Responding to these issues, Reid indicated that Cognition planned to pursue contracts with a number of federal agencies, including Homeland Security, Defense Intelligence Agency, CIA, etc. Dahlgren pointed out that "the government needs to do automated surveillance. They need disambiguation and synonymy."

As to a sound exit strategy, Jarus responded candidly: "The politically correct answer is that we are building a company with sustainability, but, if we were offered more money than we can imagine, we'd be happy to sell. Assuming we are successful in promoting our technology and gaining traction, the reality is that I'd have a hard time imagining that Cognition would remain independent. The big boys would recognize our value and want to bring it home. For example, another less obvious way to use this technology would be to match up SEO [search engine optimization] with queries and questions. SEO companies love better matches. There are plenty of opportunities out there, a whole raft of opportunities."


--------------------------------------------------------------------------------

Barbara Quint is contributing editor for NewsBreaks, editor-in-chief of Searcher, and a columnist for Information Today.

More….

Working with our first vertical partner, we have integrated our technology into Lexis-Nexis Concordance, a tool used by 65,000 attorneys. At LegalTech in NY in January, our booth was swamped. Our technology was demonstrated in 3 booths at the conference including the Lexis-Nexis booth. Called EverQuest from www.everesttek.com, it allows lawyers and their paralegals to search based on the meaning of a word. Doing this during the “discovery� phase in a lawsuit, they end up reviewing far fewer irrelevant documents and saving time and money. They also find documents with our engine missed by the pattern-matching Concordance engine. This reduces risks during many phases of a litigation process. We can actually demonstrate programmatically how our technology finds records missed by a search!

We have also indexed Medline. Doctors, medical school students and scientists in biotech companies now testing the beta version are giving us very positive feedback. They like that they don’t need to worry about ensuring they’ve thought of all biomedical synonyms. They also like entering their queries as sentences. Mostly, they like the quality of the results they get.

1. Although you can “play with� our software at www.cognitionsearch.com, to see it best (versus other engines) I should show you a demo and explain how we excel.

2. Recent Press – We’ve “launched� our company.

http://www.socaltech.com/interview_with_scott_jarus,_ceo_of_cognition/s-0008112.html

http://www.informationweek.com/research/showArticle.jhtml?articleID=198100241&cid=RSSfeed_IWK_News

http://home.businesswire.com/portal/site/google/index.jsp?ndmViewId=news_view&newsId=20070320005454&newsLang=en

http://blogs.zdnet.com/BTL/?p=4684

Scott Jarus, our CEO and a major investor, has won Ernst &Young’s Entrepreneur of the Year award. He brings true high quality leadership to the company. Our founder is Dr. Kathy Dahlgren - http://www.linguistics.ucla.edu/people/dahlgren/index.html. She is presenting in San Jose in June at the Semantic Technology Conference http://www.semantic-conference.com/ . Her focus is: http://www.semantic-conference.com/2007/sessions/o2.html.

Posted by: Brian Maser at April 10, 2007 09:01 PM

Man, I am reading these articles way later than everyone. I think that the American Libraries article by Beall should have been called "Metadata beats Keywords" since it didn't focus on search fatigue as much as it used the term to disguise an old (continuing) controversy over searching. He is also probably preaching to the choir by writing that piece for librarians.

Posted by: John Meier at June 14, 2007 09:30 AM

Technology White Papers

 

InfoWorld Technology Marketplace

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Receive instant email notification when resources on this topic become available.
 
» BUY A LINK NOW

Sponsored Technology Links