Skip to content

Why Reddit is broken, and how to fix it

Inflammatory titles aside, I think reddit is a fantastic site. I read it constantly.

The interface is beautifully clean, just simple text arranged neatly on a white-linen background. On top of that, they were founded by hackers, and funded by another hacker (and my personal hero). These factors alone give them a lot of potential.

But lately, something important has changed, and reddit is adapting slowly. Not only that, some of its adaptations are quite literally broken. As the reddit audience has grown, the type of content submitted to the site has changed. It turns out that the “wisdom of the crowd” idea on which social bookmarking sites are predicated can get tricky as your crowd gets more diverse.

Early on, reddit articles were exactly my style: the cream of the crop, content-wise, and usually technically oriented. The titles summarized the content well, and the signal-to-noise ratio was impressive. I suspect these strengths were related to reddit’s status as a well-trafficked site written in Lisp–something which catches the attention from the hacker-types. My kind of people.

Reddit’s success has brought with it a more mainstream audience. With that audience, the focus of the site has changed. These days it’s hard to tell what that focus is. Maybe it’s politics: at the time of writing, eleven of the twenty-five front page links are politically oriented. Maybe it’s funny pictures (six links). Maybe it’s a fansite for a certain unpronounceable webcomic.

Anyhow, it doesn’t particularly matter what reddit’s focus is. What matters is that it’s changing. Depending on who you are and what you like, this may be a good thing. You might love the new reddit. But plenty of people do not. This is a very common phenomenon:

  1. A site is created.
  2. First generation users fall in love.
  3. Word spreads, and the site is rapidly overwhelmed with new users.
  4. Early users grow to hate the new users who have no idea how things are done around here and now everything is RUINED!

It’s like a stampede of younger siblings.

So now we’ve got this huge swarm of users, and everyone wants things their way. Oh well, some people are going to lose interest and leave, but that’s inevitable because you can’t please everyone. The links are chosen by the majority and if you’re not the majority anymore, tough.

It doesn’t have to be like this. There is, in fact, a way to please everyone: show every user only the links he wants to see.

Now, this is a fairly complex problem, and reddit’s first solution is neat, plausible, and wrong: subreddits. Subreddits are wrong because people have diverse interests. The programming subreddit has some gems on unit testing and Ruby, but I’m not particularly interested in “Sexprs in Leopard,” which sounds like it should be on the nsfw subreddit anyway. Also, there are a few political topics I’d like to see, and maybe even some stuff from entertainment or sports.

Now stand back, because I’m going to use an analogy here: links on reddit are like different kinds of food. Everyone has different preferences when it comes to food, and delicious to me might be gag-reflex for you. How do we show people only food they like? Reddit’s current approach is like dividing options into three boxes: breakfast, lunch, and dinner. Clearly this doesn’t work. There’s not enough granularity–just because I don’t like cereal doesn’t mean I don’t like omelettes. Okay, let’s split breakfast into two more boxes: cold and hot. Still not good enough. Omelettes are tasty but a fondness for grits is indefensible. The problem here is that classifying at this level will always be too general to give good recommendations. We’re approaching this problem with a top-down mindset, and we’re not getting good results. So what should we do instead? How about bottom-up?

What if we analyzed ingredients instead? I don’t like this dip, or this chicken dish. Cold appetizer, hot entree, no commonality with the top-down approach. But if we check the ingredients, they both have cilantro. Hmmm. Might be good to recommend fewer dishes with cilantro, and let’s keep a watch for more dishes I dislike. If further dishes have that ingredient, it’s awfully likely I don’t care for that particular spice. However, someone else might love cilantro, so let’s watch to see if that’s listed in lots of the dishes they do like, and be more likely to recommend pro-cilantro recipes for them.

This approach is likely looking very familiar to many of you. It’s the basis of something called bayesian filtering. Bayesian filtering is the closest we’ve come to solving the problem of identifying spam. It works because it creates lists of words that make something likely to be spam for you. “Viagra” might be an immediate red flag for you, but for me it’s a necess–never mind.

So back to reddit. Let’s ditch the subreddits. Instead, why not implement bayesian filtering to recommend links to the users? Build the word lists from the content of the sites linked to, and maybe the submitted titles too. When I click a link, add its key words to the list of things I like. When I downvote, file them right there under cilantro. And why stop there? Why not also notice when things I like are similar to things another user likes? Then recommend stuff HE likes to me! It’s like wisdom of the crowd TIMES bayesian filtering. We can call it Web 3.0!

Okay, this idea is not so new, and not so clever. In fact, I’m surprised this hasn’t already been implemented. After all, reddit-funder Paul Graham was the guy who thought up using bayesian filtering for spam. Reddit even already has a “Recommended” link which claims it learns based on your voting, but this is a horrible, horrible lie. I have voted down dozens of Ron Paul articles (not from dislike, he just doesn’t particularly interest me) and the recommendations page I just refreshed has six articles with “ron paul” in their titles.

So I appeal to you, reddit-guys, let’s flip the whole approach and come at this bottom-up. Don’t segregate your audience into huge buckets, create a customized reddit for each and every one of us. Bring those of us with common interests together. Show us things we didn’t even know we liked yet! While you do that, I’m gonna go shut my office door and check out “sexprs in leopard.”

40 Comments

  1. You might like jaanix, it is using much more powerful recommendation system to show only things you want. On top of that it is not a black box, but lets you tune your preferences.

    Wednesday, December 12, 2007 at 2:09 pm | Permalink
  2. Stu wrote:

    Great points here man, I wouldn’t count on reddit implementing this anytime soon though. Maybe you should start a side project ? :p

    Wednesday, December 12, 2007 at 2:19 pm | Permalink
  3. Big Mon wrote:

    Can you make your font size a bit smaller. It’s just barely readable now… so a bit smaller and I wouldn’t be able to read it at all!

    Wednesday, December 12, 2007 at 2:25 pm | Permalink
  4. While something like this would be nice, I think it will be hard to ever please everyone, 100% of the time.

    Even a site like Digg, where you don’t have constant complaining about what’s on the front page (but rather, how hard it is to actually get anything frontpage’d), might only have about a 20-30% “hit ratio” of interesting stories. Perhaps its higher for some people, but I rarely click through on more than that, whether it be reddit, Digg, del.icio.us/popular, etc.

    Re: recommendation engines… take the Netflix Prize as an example. Netlfix already had a pretty good system — good, but not necessarily revolutionary. (pretty similar to Amazon’s, etc)

    Now they offer $1 million to anyone in the world who can improve it by _just 10%_ — and still the closest anyone’s come is 7-8%. So this is a very thorny problem that AI engines will likely never get much better at than they are now.

    Wednesday, December 12, 2007 at 2:34 pm | Permalink
  5. DuggersCup wrote:

    Interesting article: it seems to work well for StumbleUpon! As a British user I find that I’m bombarded with U.S. politics-related articles that I don’t really know much about, and whilst I understand that these interest the majority of the users (who are American) it has no affiliation with MYlife or interests.
    Taking that to mind, if the front page of reddit could be changed to show the highest rated articles of my preferred subjects then it surely would be a step in the right direction, surely?

    Wednesday, December 12, 2007 at 3:17 pm | Permalink
  6. fcp wrote:

    Just because you’re not seeing ideal results doesn’t mean reddit isn’t doing exactly what you recommend. I agree that reddit now leaves much to be desired, but I think it’s due more to growing pains than any error in their general methodology.

    For many months, before the site exploded in popularity, the recommended page worked wonderfully for me. Unfortunately people are strange and complex. As you say the crowd reddit is working with now is far more diverse than what they started with, and I’d assume have more complex and strangely intertwined interests and reactions to content.

    Aside from the inherent difficulties of both bayesian filtering and social recommendation, even more problems creep up when you try to mix the two with a single simple feedback interface.

    Basically the only thing you’re saying is that reddit should do what they are doing only do it better. I think everyone would agree that doing stuff better is generally a good idea.

    Now I’m going to go downvote this and hope the reddit folks keep working on and improving their algorithms so it works.

    Wednesday, December 12, 2007 at 3:31 pm | Permalink
  7. Sidney Hale wrote:

    How did I know you were going to bring up the Ron Paul when complaining about reddit links? You should install GreaseMonkey and get the reddit keyword filtering script.

    Wednesday, December 12, 2007 at 3:38 pm | Permalink
  8. dan Kelley wrote:

    It’s all very simple. Under preferences, provide a button to eliminate Ron Paul articles. Apart from Ron Paulisms, the reddit scheme works reasonably well. The system seems to be able to handle LOL cats and the other eccentric things. What it *cannot* handle is what seems to be a vast, coordinated effort to promote Ron Paul. The effort seems to involve the submission of items about the man, along with a voting army that mods them up, and that votes down any comment that criticizes the articles.

    Luckily, when the election is over, the Paulisms will go away. We’ll be back to LOLcats and other things that the Reddit system seems to be able to handle.

    None of this is to say that your idea is anything other than sensible. Perhaps it was not implemented because it’s a lot of work.

    Wednesday, December 12, 2007 at 4:31 pm | Permalink
  9. Michael wrote:

    Thank God somebody finally said it (and so coherently).

    Wednesday, December 12, 2007 at 4:47 pm | Permalink
  10. mats wrote:

    Couldn’t agree more. Although I don’t think your idea and the subreddits are mutually exclusive.

    Wednesday, December 12, 2007 at 4:49 pm | Permalink
  11. jay wrote:

    I’ve done something like this on one of my sites (the one linked in my name). We have a prominent tab on the home page called “my interests” and it shows people only the posts for the tags they are interested in. It’s the least used major feature of the site. It’s a bit of an apples to oranges comparison with reddit since our site is a blogging/community site, but I’ve found many instances where what makes sense to me isn’t the way most people want to use the internet. This may be one of those cases.

    Btw, there is a site called thoof.com that tries to do what you want. I tried it for a while, but I still find reddit more interesting.

    Wednesday, December 12, 2007 at 4:55 pm | Permalink
  12. This is less like an idea for reddit and more like an idea for how ALL digg/reddit/social-votings sites should be implemented.

    So how about this…if Reddit doesn’t listen you should take your idea to Paul Graham (and others like him). What you’re describing IS how it will be done in the future, so you might as well be the one to help it come to pass.

    Wednesday, December 12, 2007 at 5:01 pm | Permalink
  13. David wrote:

    You’ve got a broken link in there. <a href=”http://ruby-lang.org”Ruby

    Wednesday, December 12, 2007 at 5:11 pm | Permalink
  14. jaggedaloc wrote:

    Do not insult grits, sir.

    Wednesday, December 12, 2007 at 6:38 pm | Permalink
  15. aGorilla wrote:

    http://programming.reddit.com/info/62u18/comments/c02nn8n

    Wednesday, December 12, 2007 at 6:42 pm | Permalink
  16. Jared Finder wrote:

    This was something I mentioned offhand as a problem right when the site was announced on comp.lang.lisp. Lemme find the link…

    I’d love to see a site that would focus on what *I* like and not what other people like. “Wisdom of the crowds” only works if the individuals in the crowd are wise, and Reddit crossed that boundary a while ago.

    Wednesday, December 12, 2007 at 7:03 pm | Permalink
  17. Jared Finder wrote:

    Link:
    http://groups.google.com/group/comp.lang.lisp/msg/fce629040b79bc9d

    Wednesday, December 12, 2007 at 7:03 pm | Permalink
  18. rektide wrote:

    not that jeff atwood isnt great and everything, but i have his rss feed already. i dont need every single fucking thing he posts put on programming.reddit.

    i feel bad voting down the articles because they’re basically good, they’re just totally not the context i want for the sub-reddit

    @Big Mon: if you are blind you should learn to adjust the text size on your @#$@#@ own. Dont enforce your crappy eyesight on all the other users.

    Wednesday, December 12, 2007 at 7:39 pm | Permalink
  19. John wrote:

    Since it’s user voted, the users should also “tag” it. The tags also make up what the user enjoys. Of course, this might end up like Slashdot’s system, but if it was implemented well – it could work.

    Wednesday, December 12, 2007 at 7:51 pm | Permalink
  20. Robert Lee wrote:

    I’d prefer votes be weighted based on the seniority of the voter as well. Those of us that have been there since the beginning should have the most (or all) of the influence over the front page (in my opinion).

    Wednesday, December 12, 2007 at 7:52 pm | Permalink
  21. DAN RATHER wrote:

    You go to reddit to broaden your horizons… What you are describing is an RSS feeds or tags, not a link sharing and discussing site like reddit. And if you are only wanting to see Ruby news for programming, well I am sorry.

    Wednesday, December 12, 2007 at 9:21 pm | Permalink
  22. tikiloungelizard wrote:

    I love Reddit, and I don’t think it’s broken. At all.

    Wednesday, December 12, 2007 at 9:34 pm | Permalink
  23. awh wrote:

    Beautifully said. I’m a fan of reddit myself. Fantastic read! =)

    Wednesday, December 12, 2007 at 9:40 pm | Permalink
  24. Bobby wrote:

    Great article!

    Wednesday, December 12, 2007 at 10:00 pm | Permalink
  25. Reddit’s splintering of submissions via its categories is an attempt to splinter its user base and restore the focus within each category. This approach is doomed to failure, because it runs against the very principle of user-generated content: the categories are not determined by the users or by the content, but rather by patriarchal decisions made by the reddit team.

    A better approach, simpler than the suggested Bayesian one, is to generate customized per-user rankings via peers. Each user is matched with a peer group of other users whose previous preferences have been similar (via distance under the obvious metric). Only rankings from this peer group matter in determining the submissions that the user sees. This need not be server-intensive; the peer group ID’s can be stored client-side (in a cookie), and updated once a week.

    Ben clearly has a peer group that equally don’t like posts with pictures and comics, so once that’s been established he’ll never see such posts again so long as his small peer group continues to feel this way. The problem is, these preferences are complex, changing, and very difficult to predict beforehand.

    Thursday, December 13, 2007 at 12:13 am | Permalink
  26. darkmark7 wrote:

    Imaginary rules are imaginary.

    Thursday, December 13, 2007 at 12:16 am | Permalink
  27. Steve wrote:

    You may wanna check out Notizi: http://notizi.com – It’s pretty new, but it offers something like the “social” ranking you speak of. The idea is you build up a network of friends, and it shows you what’s most popular amongst your friends.

    Thursday, December 13, 2007 at 12:23 am | Permalink
  28. IvyMike wrote:

    > show every user only the links he wants to see.

    See, you’re missing some of the charms of a site like reddit (or slashdot, or even a newspaper website): these sites serve as a shared cultural touchstone in a world that’s increasingly becoming more specialized. I like it when I bump into a friend and say, “Hey did you see that article on Reddit about X”, and know he probably has. In the “everyone gets their own custom slice of the web” world, you miss out on the zeitgeist.

    Not only that, but I like reading stories that may not be in my particular list of supposed likes and dislikes–I enjoy the spontaneity of reading a story that may be outside of my normal sphere.

    Thursday, December 13, 2007 at 12:26 am | Permalink
  29. Mark Bradley wrote:

    Hi,

    well said, some intelligence on the internet would be greatly appreciated.

    You might find http://www.tiinker.com of interest, as it is an intelligent news reader.

    Thursday, December 13, 2007 at 12:34 am | Permalink
  30. Joel wrote:

    I am already building what you’re dreaming of : it’s called kolmoGNUS, it’s open source and it’s written in python. Granted, a web interface would be great, but the internals are all there and they work great. I’m looking for contributors!

    Thursday, December 13, 2007 at 1:12 am | Permalink
  31. pp wrote:

    But hey. You like lisp people, paul graham, and you don’t like sexprs? :)

    Thursday, December 13, 2007 at 1:18 am | Permalink
  32. quirkyalone wrote:

    The question is, whether the Bayeasian Filtering will work in this content. According my limited knowledge, BF is good at filtering out what you do not want (=spam), but not good at recommends what you want.

    Some time ago, I played with one open source rss reader (sorry can’t remember the name), which used BF to filter out “interesting” RSS posts based on my votes, which is the similar principle you advocates. But it just didnt work too well…

    Thursday, December 13, 2007 at 3:25 am | Permalink
  33. Dov wrote:

    I don’t care what the aims and objectives are of this site. There is no excuse for permitting repeated and vile anti-Semitic attacks. At some point the people who run broad-spectrum internet sites have to be put the same standards of accountability as others in the mass media.

    http:/www.root-1.co.il/reddit.htm

    Thursday, December 13, 2007 at 4:50 am | Permalink
  34. Jack Alexander wrote:

    I installed the ‘greasemonkey’ script to get rid of ron paul and his spambots….reddit might need more work, but you have to admit that it is better, by far, than digg and propeller.

    Thursday, December 13, 2007 at 6:17 am | Permalink
  35. When I saw Paul Graham’s name, I thought it looked very familiar.
    I’m reading Hackers and Painters right now, and I love it.
    I didn’t know he had anything to do with Reddit!

    Thursday, December 13, 2007 at 7:58 am | Permalink
  36. matt knox wrote:

    For bayesian inference to work, you want there to be a large number of features per judgement, so that you can multiply together the probabilities of buckethood vs. nonbuckethood. So in the spam filtering case, every word is a feature. Reddit does not see the sites to which it links, so all it has as a featureset is the article title and the URL. Crawling the linked sites, though possible, would mean that they would have to store a ton of data-maybe as much as 100x the amount they presently store per article.

    But let’s assume they get around that. They have a huge featureset for every article. Then they have a giant computational challenge-they have to run every article against the ‘goodness-filter’ trained by each user who views a page. Doing that at page-view time would be a disaster, so they’d have to run it beforehand and store it, which means they’d need MxN goodness values for M users and N articles.

    They know about bayesian inference, but it’s not such a good fit here.

    Thursday, December 13, 2007 at 9:24 am | Permalink
  37. Aidan Finn wrote:

    “After all, reddit-funder Paul Graham was the guy who thought up using bayesian filtering for spam.”

    http://citeseer.ist.psu.edu/sahami98bayesian.html

    Thursday, December 13, 2007 at 3:31 pm | Permalink
  38. Tad Chef wrote:

    Btw. it’s not social bookmarking but social news. del.icio.us is for bookmarking. Aside of that reddit is the most hostile “social” site I ever encountered.

    Friday, December 14, 2007 at 7:27 am | Permalink
  39. Then recommend stuff HE likes to me! It’s like wisdom of the crowd TIMES bayesian filtering. We can call it Web 3.0!

    Or, you can call it Web 1.0. Amazon worked like this from the beginning, right?

    Sunday, December 16, 2007 at 3:05 pm | Permalink
  40. I’ve just launched a site that has public user tagging that’s flavored (good/bad/neutral) so you can customize the site according to your preferences with each tag you add. When many users tag something the same way, it becomes a permanent tag of that submission. My website link will send you there. I’m looking forward to comments if anyone is interested.

    Saturday, January 26, 2008 at 12:13 am | Permalink

2 Trackbacks/Pingbacks

  1. purrl.net |** URLs that purr **| on Wednesday, December 12, 2007 at 5:04 pm

    The web’s most interesting stories on Thu 13th Dec 2007

    These are the web’s most talked about URLs on Thu 13th Dec 2007. The current winner is

  2. Some New Ideas - Pligg Forum on Wednesday, December 12, 2007 at 9:28 pm

    [...] New Ideas I was reading some stuff today and stumbled upon this: Reddit is broken : Codeulate. It talks about how to give more options to the user, as to his preference in what news he’d like [...]

Post a Comment

Your email is never published nor shared.