Having fun with twitter data

Last week I attended a study tour organized by swissnex San Francsisco. The topic: social media. We visited the big players – Twitter, Facebook, LinkedIn, You Tube – and Universities like Stanford and Berkeley, to learn about their social media strategies. The lessons learned will be posted later. However, because I am pretty jet lagged I decided to hack instead a bit on twitter data related to the study tour.

Hack #1: follower network
Firstly, I wanted to visualize the follower network of participants tweeting during the study tour. For this business I used networkx and tweepy libraries for python to grab and visualize the data. This hack was pretty straight forward. I started with an ego centered network by fetching all participants following me on twitter. Next I grabbed their followers. The data were passed to networkx to represent them as a graph and finally I visualized the network using a spring layout. There are clear clusters visible. The reasons for this is how the network was constructed and the fact that participants of the tour were not highly linked to each other before the tour started. The labels of the nodes correspond to the names registered on twitter. To enlarge the picture simply press on it. Enjoy!

Hack #2: explore the engagement of the participants
The second hack was about how people engage to share via twitter. Most people tweeted with a #springstudytour hash tag. To visualize the engagement I fetched all tweets with that hashtag. This hack was done with the twitteR library for R. The hack was inspired by Toni Hirst. OK. here is how it goes. After fetching all tweets with #springstudytour hashtag I plotted them according the time people started to engage. The vertical axis tells you who started when tweeting using the #springstudytour hashtag. The horizontal axis tells you how strong is a person’s engagement in tweeting about the topic during the study tour. For example manualnappo is very engaged :-). However the picture tells you more then that. It shows, that after a while people not attending the tour started as well to tweet with the #springstudytour hash tag (for example ‘schlittler’ did not attend the tour). Therefore, the participant’s tweets gained some attraction.

Participant’s willingness to share content on twitter with #springstudytour hasthag. Red dots indicate a retweet, whereas blue dots are original tweets. To enlarge click on picture.

Hack #3: tweet wordcloud – uncensored
This hack is about tweeted terms during the study tour. I used again the twitteR library for R to fetch the tweets with #springstudytour hash tag. For the visualization I used the wordcloud library for R. After fetching the tweets I did the ordinary natural language procedure: tokenizing, removing stop words,  and stemming. Then I constructed a tweet-term matrix. I ended up with about 600 tweets and some hundreds terms. The colors are chosen randomly in the wordcloud.

Hack #4: clustering most prominent topics
This hack is somehow more technical. The basic idea was to use a unsupervised clustering method to get out the most prominent topics from the fetched tweets. For this purpose I applied a technique called SingularValueDecomposition. For this business I used the tweet-term matrix from hack #3. Let’s skip the funny mathematical details and focus on the picture only. The most prominent topics are: facbook, twitter, social media, and the term “awesome”. Each dot in the picture represents a tweet. Nearby and equally colored tweets belong the the same topic. We can observe a clear separation of four different signals (topics) in the data. Seems that people were mostly impressed by facebook, twitter, social media in general, and yes: it was awesome!

Topic Clustering

Topic clusters detected by Singular Value Decomposition. FB=Facebook related tweets, TW=Twitter related tweets, SoM=general social media related tweets, awe=tweets consisting the term awesome.



About Blattner

Head Laboratory for Web Science.
This entry was posted in data and tagged . Bookmark the permalink.

3 Responses to Having fun with twitter data

  1. Marco Bettoni says:

    Very interesting … and beutiful pictures! From where did you have the twitter data?

    • Blattner says:

      Hi Marco
      I used twitter’s API library. Now there are wrappers for several languages (python, R, java, and others). For that post I used R’s twitteR and python’s tweepy. With these libraries it is quite straight forward to access tweets from twitter. Depending on what you do you have to use an authentication mechanism to fetch data. In python this looks like:

      # -*- coding: utf-8 -*-
      from pylab import *
      from networkx import *
      import tweepy

      consumer_key = 'your_consumer_key'
      consumer_secret = 'your_consumer_secret'

      access_key = 'your_access_key'
      access_secret = 'your_secret_key'

      auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
      auth.set_access_token(access_key, access_secret)
      api = tweepy.API(auth_handler=auth)

      friends = api.friends()
      ....do whatever you want to do...

  2. I love the engagement graph–it shows that for the most part people were tweeting and not necessarily retweeting others. Fine for a study tour but not that great in general. One should engage with others if only by RT them.

    Putting that side, I also love the Twitter info and makes me ask what that graph looked it a week before the study tour began.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s