Last week I attended a study tour organized by swissnex San Francsisco. The topic: social media. We visited the big players – Twitter, Facebook, LinkedIn, You Tube – and Universities like Stanford and Berkeley, to learn about their social media strategies. The lessons learned will be posted later. However, because I am pretty jet lagged I decided to hack instead a bit on twitter data related to the study tour.
Hack #1: follower network
Firstly, I wanted to visualize the follower network of participants tweeting during the study tour. For this business I used networkx and tweepy libraries for python to grab and visualize the data. This hack was pretty straight forward. I started with an ego centered network by fetching all participants following me on twitter. Next I grabbed their followers. The data were passed to networkx to represent them as a graph and finally I visualized the network using a spring layout. There are clear clusters visible. The reasons for this is how the network was constructed and the fact that participants of the tour were not highly linked to each other before the tour started. The labels of the nodes correspond to the names registered on twitter. To enlarge the picture simply press on it. Enjoy!
Hack #2: explore the engagement of the participants
The second hack was about how people engage to share via twitter. Most people tweeted with a #springstudytour hash tag. To visualize the engagement I fetched all tweets with that hashtag. This hack was done with the twitteR library for R. The hack was inspired by Toni Hirst. OK. here is how it goes. After fetching all tweets with #springstudytour hashtag I plotted them according the time people started to engage. The vertical axis tells you who started when tweeting using the #springstudytour hashtag. The horizontal axis tells you how strong is a person’s engagement in tweeting about the topic during the study tour. For example manualnappo is very engaged :-). However the picture tells you more then that. It shows, that after a while people not attending the tour started as well to tweet with the #springstudytour hash tag (for example ‘schlittler’ did not attend the tour). Therefore, the participant’s tweets gained some attraction.
Hack #3: tweet wordcloud – uncensored
This hack is about tweeted terms during the study tour. I used again the twitteR library for R to fetch the tweets with #springstudytour hash tag. For the visualization I used the wordcloud library for R. After fetching the tweets I did the ordinary natural language procedure: tokenizing, removing stop words, and stemming. Then I constructed a tweet-term matrix. I ended up with about 600 tweets and some hundreds terms. The colors are chosen randomly in the wordcloud.
Hack #4: clustering most prominent topics
This hack is somehow more technical. The basic idea was to use a unsupervised clustering method to get out the most prominent topics from the fetched tweets. For this purpose I applied a technique called SingularValueDecomposition. For this business I used the tweet-term matrix from hack #3. Let’s skip the funny mathematical details and focus on the picture only. The most prominent topics are: facbook, twitter, social media, and the term “awesome”. Each dot in the picture represents a tweet. Nearby and equally colored tweets belong the the same topic. We can observe a clear separation of four different signals (topics) in the data. Seems that people were mostly impressed by facebook, twitter, social media in general, and yes: it was awesome!