Novelty vs Popularity, or the Tyranny of the Masses

What I really want to read are novel web pages (e.g., that Not My Job photo). Why? Because being exposed to novelty causes brain expansion.

Digg/Reddit/Slashdot/Delicious/Metafilter/Kuro5hin all assume that many novel articles will get submitted. The second key assumption is that readers will moderate up novel articles from this pool. Then showing the articles with the most votes results in a front page filled with novel articles. How exciting!

But readers also tend up moderate up popular articles, i.e., articles that deal with topics that are popular at the moment, but not necessarily novel (e.g., some new Nintendo rumour).

Since front page space is limited, popular articles may receive many more votes than novel articles. While these novel articles languish, popular articles rise to the top (imagine cream sinking instead of rising).

I call this the “tyranny of the masses”. :) A PC way to put it is that these sites need to find a way to balance novelty and popularity.

I am 100% certain that an automatic algorithm can be created to sift the novel articles from the Internet. The ingredients I would put in the secret sauce are:

  1. Crawl all the RSS feeds on the web. Build up a database of articles. If the article is not linked in any RSS feed anywhere, assume it’s neither novel nor popular.
  2. Check if the article has been dugg, posted to reddit, etc.
  3. Check what Technorati has to say about it.
  4. Many of these sites provide RSS feeds that can be used to really deeply analyze the voting behaviour on a per-user basis.
  5. Scan the keywords in the articles.
  6. Scan the keywords in the comments (if there are any comments).
  7. Use the above to figure out which articles are only popular but not novel.
  8. Don’t display these articles.

Leave a Comment