A couple of nights ago I posted some Twitter charts, the most interesting of which is reposted to the left (click to enlarge).
What you are looking at is a dendrogram of the people I follow on Twitter, sorted by their use of English. Dendrograms are great for showing off the results of a clustering algorithm, which is exactly what I wrote in Python, from examples found in Toby Segaran’s exquisite O’Reilly book, Programming Collective Intelligence.
Essentially, I ran the script for this example across the RSS feeds of the 88 or so people I follow on Twitter to analyze each user’s use of English words, any repetition in subject matter, and visually cluster them with other people who are talking about the same sorts of things.
The result? Stunning.
After tweeting the link to this image, several of my followers began checking out people they were clustered near and discovering more interesting people for themselves. In short: it worked. Surprisingly well.
With few exceptions, the clusters of people broke down by points of interest to the point that made my jaw drop (even if I was a little overenthused).
Just for some context, @darcyyy, @hilarywalker, and @monikamagdalena, all clustered together, are my best friend from high school’s girlfriend, my best friend from college’s girlfriend, and my girlfriend. @darcyyy and @monikamagdalena know each other, @hilarywalker and @monikamagdalena know each other, but @hilarywalker and @darcyyy have never met.
The perfect algorithm would therefore have put @monikamagdalena in the middle. But would it? Both @darcyyy and @hilarywalker have recently joined Twitter, both to “see why their boyfriends have so much fun on the internet.”
Impressive.
I myself (@bryanwoods), am in a distinct cluster with @robertjwhitney (longtime friend and coworker here at the Colab) and @elliottt, a mutual friend of ours from college. No, we’re not talking about college, but the algorithm knew we had a lot in common anyway. The bond runs deeper than our education and manifests in language.
In the bottom right quadrant you’ll find the “Land of the Bloggers,” with the NYC tech and shakeshack-meetup group slightly above them.
I could go on for every single person, but needless to say I was impressed.
And while I want to give all the credit to the Python code that executed the script, that would be silly, since there’s a reason I ran it over the people I’m following on Twitter, instead of an arbitrary list of blogs or even my Twitter followers.
Twitter is the only web service on the internet right now that forces a regular user to really open up and begin discussing their true thoughts, opinions, beliefs, goals, values, etc. It’s why it’s beautiful, and it’s why it’s the first step toward The Great Entertainment which I will be addressing in future posts.
Think about it. I might list myself on Match.com as a lover of moonlit beach walks or on Facebook as a thought-obsessed hater of the genocide in Darfur, but it will only take a few days (at maximum) on Twitter to reveal whatever my true passions are (in my case, staying inside, drinking coffee, iPhone/Rails development, and–in the biggest “wow” for me–daily musings on “what kind of day” today is).
Not completely different from how I list myself on Facebook, but therein lies the important difference of How I Want Others To Perceive Me and Who I Actually Am.
So before I thought too much of it, I added my Twitter followers to the mix, and it totally skewed the results. I thought of a million reasons why the algorithm wasn’t good enough until I realized why software is still in its infancy:
The best Python script in the world is still computer code.
It can be beautiful, elegant, and efficient, but it cannot replicate human behavior.
It’s important in this case, because at least 50% of the people who follow me who I don’t reciprocally follow back are only following me for alterior motives. They’ve got a blog to peddle, a marketing career to manage, a “social networking personna” to lend credibility to, etc.
The other 50% of my followers who I don’t follow back are people I naturally find to be boring, or at least not interesting enough to have flood me with thoughts throughout the day.
Sorry, but it’s the case. And that’s important, too, as this algorithm could only work across a list of a subset of people who have something important in common.
And the important factor here, of course, is that my following list is “A List of 88 or So People Bryan Woods Happens To Find Interesting Enough To Read Mass Amounts Of Micro Messages From On A Daily Basis.”
Quite a niche market indeed, but it’s precisely why the results were so beautifully accurate.
Which is exactly why, then, that while I was hoping to build a Twitter friend recommender, it dawned on me that making one would be totally impossible.
How can I expect the unexpected? How can I script the unscripted? How do I know who’s interesting and who’s boring?
Surely there are as many (if not more) rails-addicted freaks who I find boring than I find interesting, so how do I filter them out?
I’m not sure I can, which is why I’m happy how it turned out just the way it did.
Let me know if this chart has helped you find anyone interesting.


