With the economy in the crapper, there are going to be a lot of online services folding or cutting corners to survive. This is a good time to make sure you are keeping local copies of any work that is important to you.
Fortunately for me, services shuttering or bugging out haven’t cost me much work so far, but they have come close enough a few times to make me pay attention. Two recent examples:
- I wrote four short stories of varying quality on Ficlets and was able to retrieve them before the service was killed, so it wasn’t a big deal for me. But there were other users who had written hundreds or thousands of stories for whom backing up in the small window of time provided by AOL was not realistic if they hadn’t already been saving local copies (I should note that it technically won’t be unavailable until Jan. 15).
- When we moved the Georgia Podcast Network to a new hosting service, the host had a meltdown in its datacenter a few days later, causing several days of customer data to disappear. Poof, gone. Despite this happening before we’d had the chance to set up off-site back-ups, we were able to restore almost everything. Other customers of this host weren’t so lucky.
Aside from my personal close calls, several other recent events have caused me to think more about all the data I have floating around out there:
- Yahoo’s decline over the past year. Yahoo owns Flickr. I have more than 3,000 photos hosted on Flickr. While I have local copies of those photos, I have not written captions for them or organized them as well locally.
- Twitter’s lack of a business model. I spend a lot of time sending 140-character messages to that service. I have posted close to 4,000 messages. And it could easily just vanish one day.
- Podango, a prominent podcast host, shutting down with only a few weeks notice given to users by email and no visible notice on the home page ever posted, according to one user.
- JournalSpace being wiped out by an IT guy with an axe to grind. No, I hadn’t heard of JournalSpace either, but it had been around for six years with 14,000 monthly visitors, according to Slashdot, and now it’s gone. The only recourse users have is to pull their posts out of Google’s cache.
- And perhaps the saddest such story I’ve read yet, AOL Hometown shutting down and only giving four weeks of notice to its users. Click over to that link and read some of the stories from users whose data disappeared. Snark all you’d like about it being AOL users, but it’s still sad. AOL Hometown, Ficlets and AOL Pictures were among the services run by AOL with thousands of users which were shut down recently as cost-cutting measures, wiping a lot of content out of existence.
I back up my computer and web sites to an extent that will probably sound paranoid to you:
- I run scripts on my laptop that download a copy of the databases on mine and client web sites and run a non-destructive sync of the files every hour.
- Every week, I set aside an extra copy of those back-ups in case anything weird is going on with them.
- I run Time Machine on my laptop, which periodically backs up my entire laptop drive to an external disk.
- I use JungleDisk to back up everything (including the web site back-ups) to Amazon S3.
And yet, there is still a lot of information out there that I’m trusting to other people and not backing up regularly. I’ve been looking into ways to back up as much of what I have stored on web services as I can, and will share some of what I’ve found here. Please leave comments if you have experience with these tools or if you know of any good services that I’m missing.
The only service I’ve seen that will back up your Twitter messages is Tweetake (which I can’t help but want to pronounce like bukkake). I tried it out and it seems to work pretty well. You enter your user name and password, click submit, and the service lets you download a CSV file with every Twitter message you’ve ever written. All that’s wrong with it as I see it is there’s no way to automate the process, so you have to manually visit the site every time you want to do a back-up.
Flickr
Two options I’ve seen for backing up Flickr that look promising are FlickrEdit and Migratr.
FlickrEdit is a cross-platform Java application which allows you to selectively back up albums or photos, which is nice for the sake if this post since I can just download one album as a test. It organizes albums into folders on your hard drive and names the files after their titles on Flickr. On the website, the developer claims the title, tags and description are stored in the IPTC header, however I had trouble reading this information in Photoshop. It could just be that I’m missing something, or it could be that it isn’t there.
Migratr is Windows-only software which is claimed to work with not only Flickr, but also a wide selection of web-based services. Its main purpose is to port photos from one service to another rather than to perform a local back-up. I attempted to run it in Vista under Parallels, but it crashed on me after authenticating with Flickr.
Maybe you’ll have better luck with those two programs than I did. Or, maybe your best bet would be to set up Gallery on a web server and to use the Gallery2Flickr module to sync images between the two, then back up the web server (tedious? yes, but how important are your photos to you?). I used Gallery2Flickr to port images from Gallery to Flickr, and it worked great for that. Hopefully it would work as well in reverse.
Blogger, Gmail, Google Calendar, Google Docs and other Google applications
It’s a little scary to think how much trust I’ve put into Google with my data. Clicking ‘All mail’ in Gmail tells me there are more than 25,000 email messages there. I also have about 50 documents in Google Docs, and, umm, a lot of events in Google Calendar.
Fortunately, Google allows open access to almost anything stored in its systems.
You can access Gmail using standard IMAP or POP protocols and download all your messages using a local mail client such as Mozilla Thunderbird. Google Docs will let you export files into MS Word, OpenOffice and several other formats. Google Calendar uses the standard iCal format that pretty much any calendar software can read.
I’d write tutorials on how to do all this, but Lifehacker already has it covered here.
YouTube
YouTube is also owned by Google, but I’m giving it its own category here because it wasn’t included in the LifeHacker tutorial. Fortunately, like other Google applications, there is an open API to access almost everything on the site.
There are lots of programs available to download YouTube videos locally. One that I use sometimes is Tooble, which is available for Windows and Mac OS and will download videos from YouTube and automatically convert them to a format you can watch on your computer or iPod.
That doesn’t get you comments, ratings or statistics though. It would be pretty trivial I think to roll your own script to retrieve some or all of this information. There’s a reference here. You could use that to figure out how to do a full back-up once, and then from there just write a script to automatically back-up a file at a URL that looked sort of like:
http://gdata.youtube.com/feeds/api/videos?author=rustyGAPN&orderby=published
rustyGAPN is my username, you’d want to switch that with yours. From there, I’d just recommend keeping the videos you uploaded on your harddrive rather than downloading back from YouTube, since the YouTube version will be compressed several times over by the time it gets back to you.
Blip.tv
Blip.tv also has a developer API, which means theoretically anybody could make an application or web service to back up videos from the site. I haven’t seen too many of these though, just this piece of shareware (download at your own risk, as I haven’t tried it).
You could use a similar method to what I suggested for YouTube of periodically downloading an XML file to back up titles and descriptions. The Blip API only allows for searching by keyword rather than username, so YMMV with this approach. A URL might look like this:
http://www.blip.tv/search/view/?search=rustytanton&skin=api
…where you should substitute your username for the search parameter.
Facebook and MySpace
Until very recently, both of these sites were awful about letting you get data out of them. However, it does appear both sites are moving slowly toward data portability with two recent initiatives: Facebook Connect and MySpace Data Availability.
Both APIs appear to offer substantial access to information stored on the site. However, terms of use on the respective sites could limit the creation of back-up tools anytime soon, as both have a policy prohibiting storing most data for more than 24 hours (see Facebook terms and MySpace terms). If Facebook booted Robert Scoble off for trying to pull his own data out through the API and storing it, they’ll boot you off too.
In the meantime, I’d recommend just visiting pages in MySpace and Facebook that are important to you in Firefox and taking these steps:
- Click the ‘File’ menu
- Click ‘Save Page As’
- In the box the pops up, make sure the ‘Save As:’ option is set to ‘Web Page, complete’. This will tell Firefox to also download all the images associated with the page to your computer.
- Save the file
Kludgy? Hell yes. Impractical for more than a few pages? Double hell yes. If you host photos or videos on either of these sites that are imporant to you, I’d highly recommend keeping a separate library with the same titles, tags and descriptions somewhere else.
Eventually, I expect these services to open up enough to let people back up their data, but there’s no telling when that will be. The technology is there already, they just need to adjust their terms of service. Please comment if I am wrong about any of this. And sign up at DataPortability.org.
Backing up your profile and contacts in LinkedIn is pretty easy, but you have to do it manually.
To back up your profile, log in and click the ‘View My Profile’ link in the left sidebar. On your profile, you should see an Adobe Acrobat icon. Click it, and you will download a PDF containing all the info you filled out in your profile and the recommendations people have written about you.
To back up your contacts, log in and click the ‘Contacts’ link in the left sidebar. Click the ‘Export Contacts’ link, which is close to the bottom of the screen. Select the format you’d like to save the file as, fill in the captcha text, and click ‘Export.’
I don’t know of a way to back up LinkedIn messages. This doesn’t matter to me because there’s nothing important in there, but if you know of a way please share it in the comments.
Yelp
I’ve gotten in the habit of writing reviews on Yelp for restaurants I eat at, but I haven’t been keeping local copies of these reviews.
Yelp has a developer API, but like Facebook and MySpace, the terms of service prohibit storing data. If your reviews are important to you, you will want to save copies locally as you write them.
Georgia Podcast Network
This isn’t a how-to, but about future plans.
Sometime this year I hope to build an API into the Georgia Podcast Network that will let people retrieve all their data in XML and JSON. There’s already an undocumented mini-API built in to retrieve new episodes in JSON which is used for the widgets, but there’s still not a method for retrieving everything.
We have no desire to lock down people’s data. We only want people hosting podcasts on the site who want to be there, and podcasters retain all copyrights to their content. I have other features I’d like to add on my wish list, but it is important to me that people using the site feel like they own their content.
Currently, I’m working on upgrading the site from Drupal 5 to Drupal 6, so there’s a freeze on new features until that is finished. Once that is done, I plan to make the API my top development priority.

Re: LinkedIn — if you really want to save your messages on that service, you can set your profile to send you messages via e-mail when someone e-mails you there. I think typically, they quote the entire message in the e-mail.
OMG tags!!
Good post. And some of the comments on Jason Scott’s post really pissed me off. I think his post was excellent and spot-on, but some of those people REALLY have their heads up their asses. Talk about high horses.
For more coverage of Ficlets’s demise, check out my article at http://www.teleread.org/blog/2009/01/04/requiem-for-ficletscom/
Also, I have written a tutorial to advise those people who have written dozens or hundreds of ficlets as to how they can back them all up much more quickly and easily than pasting them one by one into a word processor:
http://terrania.us/saveyourbabies.html
Sorry to comment twice – I thought what I posted on Facebook about this story might be relevant to the discussion. Here it is in a slightly edited form:
In regards to backing things up – I do think the lines about what is valuable enough to warrant a back-up get kind of fuzzy. Is every single text message I send really worth holding onto? Every IM? Every @ message on Twitter? How are these things substantially different than an off-hand remark in conversation? If we save EVERYTHING, especially now that we are generating so much content, we run into the problem of not being able to pick out the items that really “mean something.” I think this post is valuable, though, because it makes people start thinking about these issues – and forces them to determine how much of their content they really want to trust with a 3rd party.
Thanks for the comment(s) Joseph. I don’t know that there’s a universal answer to what’s worth holding onto, which is why I qualified the title of this post with “anything that’s important to you.”
There’s a lot of stuff I’ve held onto that I thought at the time I would never want to read again, but in seeing it there now found pretty useful. Example: all the cover letters I sent out when I was looking for a job a few years ago. Aside from being hilarious, I can see which ones people responded to and which ones people didn’t.
Also, an individual Twitter message is pretty worthless to me (as it is read live), but the aggregate of them is informative about my thought processes and routines at the time. Same goes for email in a lot of cases.
All this is YMMV, but I think it’s good to have an anthropological record even if you have no plans of revisiting it and sifting through it. Disk space is cheap, losing something valuable because you didn’t think it was important at the time could be expensive.
Also, we don’t always know until after the fact which items “mean something.” Things that seem mundane at the time may later have huge personal meaning. I’m not a pack-rat with anything else but when it comes to preserving personal records, I am, for that reason.
Hey, I spent yesterday backing up my LiveJournal and helping others do that. My LJ dates back to 2000. I hate the thought of that service going down.
How are you backing up Flickr? Are you keeping everything on a separate hard drive?
Jen,
I’m not currently. I’m contemplating setting up another Gallery install since the Gallery2Flickr module worked so well for porting stuff over to Flickr. I’m hoping it will work as well in reverse, but haven’t had time to set it up and try it yet.
Very relevant to your post – Lycos has recently alerted users it is shutting down its mail service next month. Imagine if you were someone who relied on that e-mail address for several years and had a lot of contacts who were going to lose touch with you once you lost the address. Not to mention that I am not sure if lycos mail allowed users to download their mail via pop3.
Full story is here
I remember when lots of people used Tripod.
Yeah – actually that’s another big thing happening… There are still a lot of older “fan sites” and information sites hosted on tripod. Guess those are all going to be gone within a couple of months. Something to consider when getting your hosting from some free service. Next on the list, Geocities?
Well, Geocities is owned by Yahoo, so it wouldn’t surprise me at all to see it shut down sometime in the next year or two.