November 05, 2005

Jump Starting Recommendation Engines with Tagged Bookmarks

A couple of weeks ago, I came across an interesting post on tagging at topix. Especially the quote on Raw Sugar, about "value added search around the tagging done by individuals on their own data" got me thinking. There is certainly the way My Web 2.0 is integrated into Yahoo! Search (try it out, they import del.icio.us bookmarks and it's amazing how often I search for stuff I already bookmarked).

But even more intriguing could be document analysis. Services like Findory are based on finding documents similar to what the system learned were other interesting documents (grossly simplified), and the most important input for the known interesting documents is the users' previous click flow. While a unstructured bookmark collection would certainly serve as starting vector, a tagged collection could be much more useful: My tagging reveals what aspect of the article interested me. I might have bookmarked this site on paper airplanes, but the tag fun will reveal, that the reason was more a meta aspect of the document rather than a vivid interest in aeronautics or handicrafts. Tagging an article on apple with innovation will emphasize the portions on their methodology as what interested me (in contrast to stories maybe about users of apple products in general).

How would that be implemented? All these personalization engines will extract document features and look for repetitions among the pool of interesting documents (again, grossly simplified). Common document features will be emphasized, rare ones dropped. By grouping documents by their tags, the features that these documents have in common could be emphasized even more. Then we can see if other users tagged the same group of documents and again filter document features against their profiles for the tags they chose for these documents (with bonus points if the tags are actually the same, not just the tagged documents). From research in collaborative filtering, we know that it works astonishingly well as long as the prediction stays in the same domain: That we both read the Hitchhiker's Guide doesn't mean we like the same music. Tagged documents might be the element to effectively use collaborative filtering techniques to extract relevant document features.

Maybe someone like Findory will try this? While the target audience for this probably isn't larger than the web 2.0 crowd, it would surely be interesting and lessons might be applicable to other loosely structured collections.

Update 2005-12-13: Yahoo buys del.icio.us. Of course.

Posted by seefeld at November 5, 2005 16:17
Comments
Post a comment












Remember personal info?