A Taxonomy of Social Search Approaches
Posted by Ofer, Product Architect on Jul 31, 2008
Delver has recently launched an alpha of its Social Search service. I mean, social-powered search. Um, make that Socially Connected Search. Wait - perhaps Social Graph Search? Or even social media search?…
Why is the terminology so difficult? The almighty Wikipedia (heck, even WP itself was branded as type of social search back in 2006) defines Social search as
“a type of web search method that determines the relevance of search results by considering the interactions or contributions of users… Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms.”
Aha, so that means stuff like del.icio.us and Flickr and Mahalo and Wikia search are really all the same type of service? Hmm.
Yes, a lot of services are titled “social search” these days, and a lot of them indeed are such. It’s just that search itself is a pretty complex process, and “…considering the interactions or contributions of users” can take many forms. Let’s try and put some order into social search approaches, and while at it, we’ll also pinpoint what it is about Delver’s approach that we think will shoot it to infinity and beyond.
So - web search is about: crawling, indexing, ranking, querying. Now, let’s see how we can put those humans into the loop:
- Crawling: products such as Mahalo employ humans to discover and add new content into its index. This approach goes back to the days of Web directories (Yahoo!, dmoz). Others, such as Lijit or Eurekster use humans to define relevant subsets on a machine-crawled index.
- Indexing: a search index maps keywords to documents, so when humans tag content items they do exactly that, describing the document by tags, as in del.icio.us or Flickr.
- Querying: most search engines analyze user query logs to suggest “query reformulations” (or even employ them automatically behind the scenes). Products such as ChaCha go even further to have a human analyze each and every query. I even met a person who positioned himself a “human search interface”, selling a service of building queries for difficult needs…
- Ranking: well, that’s the Holy Grail. Let’s break that one up further now.
Attributes of Social Ranking
We all know that not only pigeons can rank documents in web search, humans are fully capable of doing that too. And the approaches vary from describing Google’s link-based PageRank as kind of social search, to manual building of search indices per query. Sources for the human input range from direct explicit manipulation as in Wikia Search, through interpreting users’ indirect actions such as on Digg, and on to implicit inference from behavior patterns of large numbers of users, inspired by recommender systems.
To properly compare social search approaches, let us define two major attributes that best differentiate and cluster social ranking approaches, and through this prism we’ll look at existing products: Personalized vs. Aggregated, and Structure-based vs. Behavior-based.
Personalized methods tailor results to each individual user’s social footprint, whereas Aggregated methods have all of the user’s footprints contribute to a central ranking value. Structure-based approaches take social context from explicit social graph structure, as opposed to behavior-based using implicit social hints, such as like-minded clicks and votes. The chart below shows how some social search players fit into this taxonomy (remember - the attributes refer to the social aspect of ranking only):

The “Link Analysis” and “Recommender” quadrants represent approaches (employed mainly by the large search players) that have prevailed in the previous decade. Link Analysis, taking inspiration from scientific citation, used human social input in the form of hyperlinks to determine the “global” importance of a given document. Query log analysis, borrowing ideas from recommender systems, was employed by search engines to improve ranking, recent example being Microsoft’s “BrowseRank” paper. Labeling itself as “community-powered search”, Collarity attempted to use collaborative filtering for direct ranking of web search results, but recently the company abandoned this direction to focus on advertising. Several other companies provide recommendation engines, with the recent example of Baynote powering search at Expedia, but all these offer site-search functionality only, and do not even attempt to scale the solution to the entire web.
The two other quadrants form the new directions social search has been heading in recent years. Most of the activity we’ve seen so far has been in the quadrant we call “Crowdsourced ranking”, where users are asked to rank search results, directly or indirectly, and the input is used to produce a global ranking scheme, with all the scalability questions.
At Delver, we believe that information encoded in a well-structured social graph is one of the most valuable resources, that are yet untapped in the quest for better relevance. So far, the most similar value has been in link analysis, as demonstrated by the strength of Google’s PageRank, but PageRank is an aggregated approach built on links between websites. To make results relevant for a specific searcher, and help that user clearly understand exactly why each result is relevant, ranking must take into account the graph describing that user, and this is exactly what we’re doing. Stay tuned to find out more on the challenges in doing that, and how we tackle them!
Delver Social Profile Powered Search Engine…
Social or community powered search has been attempted a number of different ways. Delver, an Israel based startup, looks to leverage users’ social graph to create relevant social search results. I got a preview of Delver previously and saw at the 200…