Inflammatory titles aside, I think reddit is a fantastic site. I read it constantly.
The interface is beautifully clean, just simple text arranged neatly on a white-linen background. On top of that, they were founded by hackers, and funded by another hacker (and my personal hero). These factors alone give them a lot of potential.
But lately, something important has changed, and reddit is adapting slowly. Not only that, some of its adaptations are quite literally broken. As the reddit audience has grown, the type of content submitted to the site has changed. It turns out that the “wisdom of the crowd” idea on which social bookmarking sites are predicated can get tricky as your crowd gets more diverse.
Early on, reddit articles were exactly my style: the cream of the crop, content-wise, and usually technically oriented. The titles summarized the content well, and the signal-to-noise ratio was impressive. I suspect these strengths were related to reddit’s status as a well-trafficked site written in Lisp–something which catches the attention from the hacker-types. My kind of people.
Reddit’s success has brought with it a more mainstream audience. With that audience, the focus of the site has changed. These days it’s hard to tell what that focus is. Maybe it’s politics: at the time of writing, eleven of the twenty-five front page links are politically oriented. Maybe it’s funny pictures (six links). Maybe it’s a fansite for a certain unpronounceable webcomic.
Anyhow, it doesn’t particularly matter what reddit’s focus is. What matters is that it’s changing. Depending on who you are and what you like, this may be a good thing. You might love the new reddit. But plenty of people do not. This is a very common phenomenon:
- A site is created.
- First generation users fall in love.
- Word spreads, and the site is rapidly overwhelmed with new users.
- Early users grow to hate the new users who have no idea how things are done around here and now everything is RUINED!
It’s like a stampede of younger siblings.
So now we’ve got this huge swarm of users, and everyone wants things their way. Oh well, some people are going to lose interest and leave, but that’s inevitable because you can’t please everyone. The links are chosen by the majority and if you’re not the majority anymore, tough.
It doesn’t have to be like this. There is, in fact, a way to please everyone: show every user only the links he wants to see.
Now, this is a fairly complex problem, and reddit’s first solution is neat, plausible, and wrong: subreddits. Subreddits are wrong because people have diverse interests. The programming subreddit has some gems on unit testing and Ruby, but I’m not particularly interested in “Sexprs in Leopard,” which sounds like it should be on the nsfw subreddit anyway. Also, there are a few political topics I’d like to see, and maybe even some stuff from entertainment or sports.
Now stand back, because I’m going to use an analogy here: links on reddit are like different kinds of food. Everyone has different preferences when it comes to food, and delicious to me might be gag-reflex for you. How do we show people only food they like? Reddit’s current approach is like dividing options into three boxes: breakfast, lunch, and dinner. Clearly this doesn’t work. There’s not enough granularity–just because I don’t like cereal doesn’t mean I don’t like omelettes. Okay, let’s split breakfast into two more boxes: cold and hot. Still not good enough. Omelettes are tasty but a fondness for grits is indefensible. The problem here is that classifying at this level will always be too general to give good recommendations. We’re approaching this problem with a top-down mindset, and we’re not getting good results. So what should we do instead? How about bottom-up?
What if we analyzed ingredients instead? I don’t like this dip, or this chicken dish. Cold appetizer, hot entree, no commonality with the top-down approach. But if we check the ingredients, they both have cilantro. Hmmm. Might be good to recommend fewer dishes with cilantro, and let’s keep a watch for more dishes I dislike. If further dishes have that ingredient, it’s awfully likely I don’t care for that particular spice. However, someone else might love cilantro, so let’s watch to see if that’s listed in lots of the dishes they do like, and be more likely to recommend pro-cilantro recipes for them.
This approach is likely looking very familiar to many of you. It’s the basis of something called bayesian filtering. Bayesian filtering is the closest we’ve come to solving the problem of identifying spam. It works because it creates lists of words that make something likely to be spam for you. “Viagra” might be an immediate red flag for you, but for me it’s a necess–never mind.
So back to reddit. Let’s ditch the subreddits. Instead, why not implement bayesian filtering to recommend links to the users? Build the word lists from the content of the sites linked to, and maybe the submitted titles too. When I click a link, add its key words to the list of things I like. When I downvote, file them right there under cilantro. And why stop there? Why not also notice when things I like are similar to things another user likes? Then recommend stuff HE likes to me! It’s like wisdom of the crowd TIMES bayesian filtering. We can call it Web 3.0!
Okay, this idea is not so new, and not so clever. In fact, I’m surprised this hasn’t already been implemented. After all, reddit-funder Paul Graham was the guy who thought up using bayesian filtering for spam. Reddit even already has a “Recommended” link which claims it learns based on your voting, but this is a horrible, horrible lie. I have voted down dozens of Ron Paul articles (not from dislike, he just doesn’t particularly interest me) and the recommendations page I just refreshed has six articles with “ron paul” in their titles.
So I appeal to you, reddit-guys, let’s flip the whole approach and come at this bottom-up. Don’t segregate your audience into huge buckets, create a customized reddit for each and every one of us. Bring those of us with common interests together. Show us things we didn’t even know we liked yet! While you do that, I’m gonna go shut my office door and check out “sexprs in leopard.”
40 Comments
You might like jaanix, it is using much more powerful recommendation system to show only things you want. On top of that it is not a black box, but lets you tune your preferences.
Great points here man, I wouldn’t count on reddit implementing this anytime soon though. Maybe you should start a side project ? :p
Can you make your font size a bit smaller. It’s just barely readable now… so a bit smaller and I wouldn’t be able to read it at all!
While something like this would be nice, I think it will be hard to ever please everyone, 100% of the time.
Even a site like Digg, where you don’t have constant complaining about what’s on the front page (but rather, how hard it is to actually get anything frontpage’d), might only have about a 20-30% “hit ratio” of interesting stories. Perhaps its higher for some people, but I rarely click through on more than that, whether it be reddit, Digg, del.icio.us/popular, etc.
Re: recommendation engines… take the Netflix Prize as an example. Netlfix already had a pretty good system — good, but not necessarily revolutionary. (pretty similar to Amazon’s, etc)
Now they offer $1 million to anyone in the world who can improve it by _just 10%_ — and still the closest anyone’s come is 7-8%. So this is a very thorny problem that AI engines will likely never get much better at than they are now.
Interesting article: it seems to work well for StumbleUpon! As a British user I find that I’m bombarded with U.S. politics-related articles that I don’t really know much about, and whilst I understand that these interest the majority of the users (who are American) it has no affiliation with MYlife or interests.
Taking that to mind, if the front page of reddit could be changed to show the highest rated articles of my preferred subjects then it surely would be a step in the right direction, surely?
Just because you’re not seeing ideal results doesn’t mean reddit isn’t doing exactly what you recommend. I agree that reddit now leaves much to be desired, but I think it’s due more to growing pains than any error in their general methodology.
For many months, before the site exploded in popularity, the recommended page worked wonderfully for me. Unfortunately people are strange and complex. As you say the crowd reddit is working with now is far more diverse than what they started with, and I’d assume have more complex and strangely intertwined interests and reactions to content.
Aside from the inherent difficulties of both bayesian filtering and social recommendation, even more problems creep up when you try to mix the two with a single simple feedback interface.
Basically the only thing you’re saying is that reddit should do what they are doing only do it better. I think everyone would agree that doing stuff better is generally a good idea.
Now I’m going to go downvote this and hope the reddit folks keep working on and improving their algorithms so it works.
How did I know you were going to bring up the Ron Paul when complaining about reddit links? You should install GreaseMonkey and get the reddit keyword filtering script.
It’s all very simple. Under preferences, provide a button to eliminate Ron Paul articles. Apart from Ron Paulisms, the reddit scheme works reasonably well. The system seems to be able to handle LOL cats and the other eccentric things. What it *cannot* handle is what seems to be a vast, coordinated effort to promote Ron Paul. The effort seems to involve the submission of items about the man, along with a voting army that mods them up, and that votes down any comment that criticizes the articles.
Luckily, when the election is over, the Paulisms will go away. We’ll be back to LOLcats and other things that the Reddit system seems to be able to handle.
None of this is to say that your idea is anything other than sensible. Perhaps it was not implemented because it’s a lot of work.
Thank God somebody finally said it (and so coherently).
Couldn’t agree more. Although I don’t think your idea and the subreddits are mutually exclusive.
I’ve done something like this on one of my sites (the one linked in my name). We have a prominent tab on the home page called “my interests” and it shows people only the posts for the tags they are interested in. It’s the least used major feature of the site. It’s a bit of an apples to oranges comparison with reddit since our site is a blogging/community site, but I’ve found many instances where what makes sense to me isn’t the way most people want to use the internet. This may be one of those cases.
Btw, there is a site called thoof.com that tries to do what you want. I tried it for a while, but I still find reddit more interesting.
This is less like an idea for reddit and more like an idea for how ALL digg/reddit/social-votings sites should be implemented.
So how about this…if Reddit doesn’t listen you should take your idea to Paul Graham (and others like him). What you’re describing IS how it will be done in the future, so you might as well be the one to help it come to pass.
You’ve got a broken link in there. <a href=”http://ruby-lang.org”Ruby
Do not insult grits, sir.
http://programming.reddit.com/info/62u18/comments/c02nn8n
This was something I mentioned offhand as a problem right when the site was announced on comp.lang.lisp. Lemme find the link…
I’d love to see a site that would focus on what *I* like and not what other people like. “Wisdom of the crowds” only works if the individuals in the crowd are wise, and Reddit crossed that boundary a while ago.
Link:
http://groups.google.com/group/comp.lang.lisp/msg/fce629040b79bc9d
not that jeff atwood isnt great and everything, but i have his rss feed already. i dont need every single fucking thing he posts put on programming.reddit.
i feel bad voting down the articles because they’re basically good, they’re just totally not the context i want for the sub-reddit
@Big Mon: if you are blind you should learn to adjust the text size on your @#$@#@ own. Dont enforce your crappy eyesight on all the other users.
Since it’s user voted, the users should also “tag” it. The tags also make up what the user enjoys. Of course, this might end up like Slashdot’s system, but if it was implemented well – it could work.
I’d prefer votes be weighted based on the seniority of the voter as well. Those of us that have been there since the beginning should have the most (or all) of the influence over the front page (in my opinion).
You go to reddit to broaden your horizons… What you are describing is an RSS feeds or tags, not a link sharing and discussing site like reddit. And if you are only wanting to see Ruby news for programming, well I am sorry.
I love Reddit, and I don’t think it’s broken. At all.
Beautifully said. I’m a fan of reddit myself. Fantastic read! =)
Great article!
Reddit’s splintering of submissions via its categories is an attempt to splinter its user base and restore the focus within each category. This approach is doomed to failure, because it runs against the very principle of user-generated content: the categories are not determined by the users or by the content, but rather by patriarchal decisions made by the reddit team.
A better approach, simpler than the suggested Bayesian one, is to generate customized per-user rankings via peers. Each user is matched with a peer group of other users whose previous preferences have been similar (via distance under the obvious metric). Only rankings from this peer group matter in determining the submissions that the user sees. This need not be server-intensive; the peer group ID’s can be stored client-side (in a cookie), and updated once a week.
Ben clearly has a peer group that equally don’t like posts with pictures and comics, so once that’s been established he’ll never see such posts again so long as his small peer group continues to feel this way. The problem is, these preferences are complex, changing, and very difficult to predict beforehand.
Imaginary rules are imaginary.
You may wanna check out Notizi: http://notizi.com – It’s pretty new, but it offers something like the “social” ranking you speak of. The idea is you build up a network of friends, and it shows you what’s most popular amongst your friends.
> show every user only the links he wants to see.
See, you’re missing some of the charms of a site like reddit (or slashdot, or even a newspaper website): these sites serve as a shared cultural touchstone in a world that’s increasingly becoming more specialized. I like it when I bump into a friend and say, “Hey did you see that article on Reddit about X”, and know he probably has. In the “everyone gets their own custom slice of the web” world, you miss out on the zeitgeist.
Not only that, but I like reading stories that may not be in my particular list of supposed likes and dislikes–I enjoy the spontaneity of reading a story that may be outside of my normal sphere.
Hi,
well said, some intelligence on the internet would be greatly appreciated.
You might find http://www.tiinker.com of interest, as it is an intelligent news reader.
I am already building what you’re dreaming of : it’s called kolmoGNUS, it’s open source and it’s written in python. Granted, a web interface would be great, but the internals are all there and they work great. I’m looking for contributors!
But hey. You like lisp people, paul graham, and you don’t like sexprs? :)
The question is, whether the Bayeasian Filtering will work in this content. According my limited knowledge, BF is good at filtering out what you do not want (=spam), but not good at recommends what you want.
Some time ago, I played with one open source rss reader (sorry can’t remember the name), which used BF to filter out “interesting” RSS posts based on my votes, which is the similar principle you advocates. But it just didnt work too well…
I don’t care what the aims and objectives are of this site. There is no excuse for permitting repeated and vile anti-Semitic attacks. At some point the people who run broad-spectrum internet sites have to be put the same standards of accountability as others in the mass media.
http:/www.root-1.co.il/reddit.htm
I installed the ‘greasemonkey’ script to get rid of ron paul and his spambots….reddit might need more work, but you have to admit that it is better, by far, than digg and propeller.
When I saw Paul Graham’s name, I thought it looked very familiar.
I’m reading Hackers and Painters right now, and I love it.
I didn’t know he had anything to do with Reddit!
For bayesian inference to work, you want there to be a large number of features per judgement, so that you can multiply together the probabilities of buckethood vs. nonbuckethood. So in the spam filtering case, every word is a feature. Reddit does not see the sites to which it links, so all it has as a featureset is the article title and the URL. Crawling the linked sites, though possible, would mean that they would have to store a ton of data-maybe as much as 100x the amount they presently store per article.
But let’s assume they get around that. They have a huge featureset for every article. Then they have a giant computational challenge-they have to run every article against the ‘goodness-filter’ trained by each user who views a page. Doing that at page-view time would be a disaster, so they’d have to run it beforehand and store it, which means they’d need MxN goodness values for M users and N articles.
They know about bayesian inference, but it’s not such a good fit here.
“After all, reddit-funder Paul Graham was the guy who thought up using bayesian filtering for spam.”
http://citeseer.ist.psu.edu/sahami98bayesian.html
Btw. it’s not social bookmarking but social news. del.icio.us is for bookmarking. Aside of that reddit is the most hostile “social” site I ever encountered.
Then recommend stuff HE likes to me! It’s like wisdom of the crowd TIMES bayesian filtering. We can call it Web 3.0!
Or, you can call it Web 1.0. Amazon worked like this from the beginning, right?
I’ve just launched a site that has public user tagging that’s flavored (good/bad/neutral) so you can customize the site according to your preferences with each tag you add. When many users tag something the same way, it becomes a permanent tag of that submission. My website link will send you there. I’m looking forward to comments if anyone is interested.
2 Trackbacks/Pingbacks
The web’s most interesting stories on Thu 13th Dec 2007
These are the web’s most talked about URLs on Thu 13th Dec 2007. The current winner is
[...] New Ideas I was reading some stuff today and stumbled upon this: Reddit is broken : Codeulate. It talks about how to give more options to the user, as to his preference in what news he’d like [...]
Post a Comment