Twitter retweet analysis
With Professor Lars Kai Hansen I am presently looking into retweeting on Twitter. A 2010 scientific article Want to be retweeted? Large scale Analytics on factors impacting retweet in Twitter network by Suh, Hong, Pirolli and Ed H. Chi, examined what the variables hashtag, “@”, number of followers, number of followees, age of account, number of tweets, number of favorited tweets and number of tweets have of effect on whether a tweet is retweets.
The article also points to Dan Zarrella’s previous writings. He has a blog as well as the slides The Science of ReTweets. Zarrella reports (on page 11 in the slides) statistics on the fraction of retweets with URLs and it is well over 50%, Suh & Co. writes it to be 56.69% to be exacts.
This fraction does not fit with what Suh & Co. find. They say only 28.4% of retweets have URLs.
To investigate this discrepancy I looked into the tweets I had downloaded. The tweets were downloaded with the streaming method provided by Twitter that I heard of through Bjarne Ørum Wahlgreen. I am furthermore using the MongoDB noSQL database for storing at the moment (I used SQLite before). It means that you can write the downloading and storing in one Unix line which is (with a tip from Eliot):
curl http://stream.twitter.com/1/statuses/sample.json -u:USER:PASSWORD | mongoimport -d twitter -c tweets
I have only a bit above 330’000 tweets in my database at the moment, but my results align better with Suh & Co than with Zarrella. The result depends on the matching of a retweet. For my most broadest I get 25.2%.
Furthermore, I find that the fraction of tweets with URLs is 19.1% which is in alignment with both Zarella and Suh & Co that both report around 20%. I find the fraction of retweets to the total to be in the range 9-16%.
The detailed results are here:
Total 330000 100.0% With URLs 62901 19.1% Retweet 52633 15.9% of total Retweet with URLs 13239 4.0% of total 25.2% of retweets 21.0% of tweets with URLs
Suh & Co. found that hashtags were associated with increased retweeting. On a blog one of the authors writes "Want to be Retweeted? Add Hashtags to Your Tweets!". I doubt that the causal relationship is that simple. I think it is more likely that a common effect (e.g., that the tweet is informative and well-written) causes the tweet to get hashtag(s) and be retweeted.