A couple of months ago I decided to expand my use of Twitter, and began following more and more people in the education community. This was awesome, until I realized I could not keep up with the barrage of data that I was being given. In November I started brainstorming ideas on how to keep up. I noticed in any of my twitter clients that they were not grabbing all the tweets that had been posted from the last time I checked. The reason being is the API only gives you the last 200 tweets, and for me that was about 40 minutes worth.
My first course of action was to write some software that would grab the tweets from my [@mr_rcollins] timeline, parse the info and store it in a MySQL database. Besides pulling out the data I was interested in of each tweet, I also stored the complete tweet. This became impractical, since in a month the complete tweets themselves occupied 4.2GB! I stopped storing the complete tweets which left me with a 20MB database after a 5 weeks of collecting, which was a lot more manageable.
The next step was to start parsing the tweet's text for urls, resolve any shortened urls, and dump them into another table for me to peruse. While I got that software working, I came across ReadTwit.com. This is a great service that will take your timeline, parse out the urls, resolve shortened links, and give you a RSS feed that you can subscribe to in your favorite RSS reader (I use Google Reader. Now I just go through Reader like normal, and am able to tag/star important sites that are posted to my Twitter timeline.
: http://www.flickr.com/photos/[email protected]/4091878747/ "10th November 314/365" : http://creativecommons.org/licenses/by/2.0/ "Attribution License"