September 04, 2003

Search the Internet Archive

Quite amazing: Recall allows you to search 11 billion web pages in the internet archive. Of couse, you can restrict by date, but it also plots graphs on when your search terms appeared on the web.

Even more so, it automatically (and amazingly effectively) categorizes results and plots graphs on the time-dependence of your terms in conjunction with these categories. Cool.

Ranking is done purely on content, there doesn't seem to be any link analysis. The author claims, that there is additional content analysis being done to increase relevance, although it is not really clear what. Also read her powerpoint presentation of the technology. It includes some technical details on the hardware and the performance of the (all POSIX C) code.

The presentation mentions a personalization feature, but I wasn't able to reproduce it. Apparently the results from previous queries should influence later queries. I wonder wether I could seed the personalization with the contents of my blog.

I wasn't able to find any connection to nutch, which is also hosted at the internet archive.

Give it a try!

Posted by seefeld at September 4, 2003 11:15
Trackback
TrackBack URL for this entry:
http://www.bernhardseefeld.ch/mt-tb.cgi/57

Listed below are links to weblogs that continue the discussion on 'Search the Internet Archive'
Comments

Hi! I also live in Bern! Nice to see other Swiss people blogging! :)

Feel free to visit my blog too. I update almost daily.

Posted by: Suha at September 5, 2003 12:59 AM

Nice to see more blogs in Bern!

I added you to my list of blogs in Bern (on the right side of the homepage)

Posted by: Bernhard Seefeld at September 5, 2003 11:17 AM
Post a comment












Remember personal info?