Here is an observation that often made in the last years and that was recently confirmed: Intelligence in search software is largely only as useful as it is predictable by the user.
But let me digress for a minute: A problem users (novices and power-users alike) consistently mention when asked about their search experience is "too many results". I used to defer this to the problem of increasing relevancy, thinking that as soon as people will find what they want in the first results, they will stop worrying about that number at the top of their results set. More so, as the same people often think of features like stemming and inflecting, automatic query expansion through synonyms and latent semantic indexing (LSI) as being useful and request them when explained to them, although all of them actually increase the number of results.
But dismissing this feedback so easily is dangerous. So what is the real issue with "too many results"? I think we have to look into more psychological factors. Things I hear here are a long the line of "I don't know wether important results are in the deeper results" or "I don't trust the search engine to do what I meant". I think a plausible theory is, that users see their search engine as a tool for their thought process, whose output is only useful as they can fit it into bigger picture and that this is limited by how far they can comprehend what the tool actually does and doesn't.
I think this problem magnifies in fields like law (relevant laws, previous cases), patents (prior art), medical records, etc. Generally it is more important in searches other than global web searches, but even there you don't see much movement towards more linguistically magical technologies. Of course, where recall is important - and it would be important in exactly law, patents, medical records, etc. - such technologies seem more and more appropriate. Still, given a choice many users will choose the "less intelligent" search engine.
The challenge lies in constructing user interfaces that better explain what the search engine did (and didn't) without overwhelming the user. A nice example for a specialty search engine that does this, is Yahoo! Shopping's new SmartSort. Look at how they handle digital cameras: My first result is commented as "Pentax Optio S4 is a subcompact camera. It is ranked first because it has the highest Optical Zoom compared to the others in your top 10 results. This Digital Camera is more compact than Casio EXILIM EX Z4U." followed by "Casio EXILIM EX Z4U is a subcompact camera. It is ranked second because it has the least expensive price and the highest Optical Zoom compared to the others in your top 10 results. This Digital Camera is cheaper and is smaller than Fuji FinePix F700 which is displayed next.".
How could this look in a more general search engine? "All following results are less popular than the ones already displayed" or "The previous results are considered authorities on the subject, the following results mention the subject on other contexts"? Probably too much text, but then it is only needed in cases of urgent curiosity. Maybe a "explain" link is enough, just as Nutch has.
Another important point is that transparency empowers. Intelligence like LSI works only for the problem it was optimized for. The more "stupid" tools are, in practical use, more versatile (see also the argument behind world of ends). Watch your thought process when formulating queries: I tend to mix descriptions of the subject with terms I expect on matching pages, often shortening longer product names to parts I think are unique enough. And think of how you use a search engine for different tasks than document retrieval: Spell checking, comparing usage of combinations of words, checking prevalent meanings by glancing at excerpts, even preparing an actual query by (manually!) interfering synonyms from result sets. All of this is not possible if I have no real idea what the search engine does.
What does this mean in the context of precision and recall? Increase recall with methods that are transparent and predictable in the way they work, avoid magic. Increase precision with transparent, predictable tools (e.g. intuitive measures of popularity; e.g. partition the result set along well known boundaries).
Does this match your experiences with search? Is there research that points in a similar direction?
I search for blog like this long time.You website is very good!I will come next time!
online poker
Posted by: online poker at August 1, 2005 12:28 PM