Two Programmers Claim Reddit's Voting Algorithm Is Flawed
That seems trivial at first, especially if you don't understand code. Reddit appears to be working just fine - so who cares about a typo among thousands of lines of code?
But Reddit is a massive distributor of traffic around the web. It had 90 million unique users visit its pages last month. Publishers (Business Insider included) benefit hugely when a post becomes popular on Reddit. A single hot link from Reddit can pour hundreds of thousands of readers into your site within hours. And those pageviews are easily monetized with ads.
So there is a lot at stake. People trust Reddit to get it right.
Ian Greenleaf, a San Diego-based programmer, claims in a blog post that the sorting mechanism Reddit uses to rank new posts can bury those posts if they initially receive a few negative votes. It's complicated, but basically Reddit's code - which has been published publicly so developers can examine it - has two ranking mechanisms: Time, so that new posts are favored over old posts; and net positive or negative votes, so that posts people like are favored over those that people don't like.
The problem, Greenleaf says, occurs when a new post gets a few negative votes before it gets any positive votes, rendering its vote score less than zero:
... imagine one submission made a year ago, and another submission made just now. The year-old submission received 2 upvotes, and today's submission received two downvotes. This is a small difference - perhaps today's submission got off to a bad start and will rebound shortly with several upvotes. But under this implementation, today's submission now has a negative hotness score and will rate lower than the submission from last year.
Greenleaf says that the formula condemns some posts to a "purgatory" in Reddit, where they never get seen by other redditors. "These posts are sad, alone, and afraid. And notably, they are sorted oldest first, just as I predicted."
Systems librarian Jonathan Rochkind has made the same claim:
So it turns out there's a significant typo, that keeps the algorithm from working right, in the several previously blogged descriptions of reddit's story-ranking algorithm.
... More oddly, this same significant typo is in the public version of reddit's code released on github.
On Hacker News, the claim has been rebutted by a user named "ketrainis" who appears to be a Reddit administrator. Ketrainis says that making sure that disliked posts don't show up is kinda the point:
This comes up every 6 months or so, always with some sensational title like this.
... The thing is, the two most important pages are the front page (or a subreddit's own hot page) and the new page. The new page is sorted by date ignoring hotness, and if something has a negative score it's not going to show up on the front/hot page anyway. The two other main opportunities to get popular (rising and the organic box) don't really use hotness either.
So when it comes down to it, what happens below 0 is pretty moot. Smoothness around the real life dates and scores on the site is more important than smoothness around 0, where we don't really have listings that will display it anyway.
In summary, there don't exist listings in which the discontinuities at 0 really matter.