With the economy in the crapper, there are going to be a lot of online services folding or cutting corners to survive. This is a good time to make sure you are keeping local copies of any work that is important to you.
Fortunately for me, services shuttering or bugging out haven’t cost me much work so far, but they have come close enough a few times to make me pay attention. Two recent examples:
- I wrote four short stories of varying quality on Ficlets and was able to retrieve them before the service was killed, so it wasn’t a big deal for me. But there were other users who had written hundreds or thousands of stories for whom backing up in the small window of time provided by AOL was not realistic if they hadn’t already been saving local copies (I should note that it technically won’t be unavailable until Jan. 15).
- When we moved the Georgia Podcast Network to a new hosting service, the host had a meltdown in its datacenter a few days later, causing several days of customer data to disappear. Poof, gone. Despite this happening before we’d had the chance to set up off-site back-ups, we were able to restore almost everything. Other customers of this host weren’t so lucky.
Aside from my personal close calls, several other recent events have caused me to think more about all the data I have floating around out there:
- Yahoo’s decline over the past year. Yahoo owns Flickr. I have more than 3,000 photos hosted on Flickr. While I have local copies of those photos, I have not written captions for them or organized them as well locally.
- Twitter’s lack of a business model. I spend a lot of time sending 140-character messages to that service. I have posted close to 4,000 messages. And it could easily just vanish one day.
- Podango, a prominent podcast host, shutting down with only a few weeks notice given to users by email and no visible notice on the home page ever posted, according to one user.
- JournalSpace being wiped out by an IT guy with an axe to grind. No, I hadn’t heard of JournalSpace either, but it had been around for six years with 14,000 monthly visitors, according to Slashdot, and now it’s gone. The only recourse users have is to pull their posts out of Google’s cache.
- And perhaps the saddest such story I’ve read yet, AOL Hometown shutting down and only giving four weeks of notice to its users. Click over to that link and read some of the stories from users whose data disappeared. Snark all you’d like about it being AOL users, but it’s still sad. AOL Hometown, Ficlets and AOL Pictures were among the services run by AOL with thousands of users which were shut down recently as cost-cutting measures, wiping a lot of content out of existence.
I back up my computer and web sites to an extent that will probably sound paranoid to you:
- I run scripts on my laptop that download a copy of the databases on mine and client web sites and run a non-destructive sync of the files every hour.
- Every week, I set aside an extra copy of those back-ups in case anything weird is going on with them.
- I run Time Machine on my laptop, which periodically backs up my entire laptop drive to an external disk.
- I use JungleDisk to back up everything (including the web site back-ups) to Amazon S3.
And yet, there is still a lot of information out there that I’m trusting to other people and not backing up regularly. I’ve been looking into ways to back up as much of what I have stored on web services as I can, and will share some of what I’ve found here. Please leave comments if you have experience with these tools or if you know of any good services that I’m missing.
The only service I’ve seen that will back up your Twitter messages is Tweetake (which I can’t help but want to pronounce like bukkake). I tried it out and it seems to work pretty well. You enter your user name and password, click submit, and the service lets you download a CSV file with every Twitter message you’ve ever written. All that’s wrong with it as I see it is there’s no way to automate the process, so you have to manually visit the site every time you want to do a back-up.
FlickrEdit is a cross-platform Java application which allows you to selectively back up albums or photos, which is nice for the sake if this post since I can just download one album as a test. It organizes albums into folders on your hard drive and names the files after their titles on Flickr. On the website, the developer claims the title, tags and description are stored in the IPTC header, however I had trouble reading this information in Photoshop. It could just be that I’m missing something, or it could be that it isn’t there.
Migratr is Windows-only software which is claimed to work with not only Flickr, but also a wide selection of web-based services. Its main purpose is to port photos from one service to another rather than to perform a local back-up. I attempted to run it in Vista under Parallels, but it crashed on me after authenticating with Flickr.
Maybe you’ll have better luck with those two programs than I did. Or, maybe your best bet would be to set up Gallery on a web server and to use the Gallery2Flickr module to sync images between the two, then back up the web server (tedious? yes, but how important are your photos to you?). I used Gallery2Flickr to port images from Gallery to Flickr, and it worked great for that. Hopefully it would work as well in reverse.
Blogger, Gmail, Google Calendar, Google Docs and other Google applications
It’s a little scary to think how much trust I’ve put into Google with my data. Clicking ‘All mail’ in Gmail tells me there are more than 25,000 email messages there. I also have about 50 documents in Google Docs, and, umm, a lot of events in Google Calendar.
Fortunately, Google allows open access to almost anything stored in its systems.
You can access Gmail using standard IMAP or POP protocols and download all your messages using a local mail client such as Mozilla Thunderbird. Google Docs will let you export files into MS Word, OpenOffice and several other formats. Google Calendar uses the standard iCal format that pretty much any calendar software can read.
I’d write tutorials on how to do all this, but Lifehacker already has it covered here.
YouTube is also owned by Google, but I’m giving it its own category here because it wasn’t included in the LifeHacker tutorial. Fortunately, like other Google applications, there is an open API to access almost everything on the site.
There are lots of programs available to download YouTube videos locally. One that I use sometimes is Tooble, which is available for Windows and Mac OS and will download videos from YouTube and automatically convert them to a format you can watch on your computer or iPod.
That doesn’t get you comments, ratings or statistics though. It would be pretty trivial I think to roll your own script to retrieve some or all of this information. There’s a reference here. You could use that to figure out how to do a full back-up once, and then from there just write a script to automatically back-up a file at a URL that looked sort of like:
rustyGAPN is my username, you’d want to switch that with yours. From there, I’d just recommend keeping the videos you uploaded on your harddrive rather than downloading back from YouTube, since the YouTube version will be compressed several times over by the time it gets back to you.
Blip.tv also has a developer API, which means theoretically anybody could make an application or web service to back up videos from the site. I haven’t seen too many of these though, just this piece of shareware (download at your own risk, as I haven’t tried it).
You could use a similar method to what I suggested for YouTube of periodically downloading an XML file to back up titles and descriptions. The Blip API only allows for searching by keyword rather than username, so YMMV with this approach. A URL might look like this:
…where you should substitute your username for the search parameter.
Facebook and MySpace
Until very recently, both of these sites were awful about letting you get data out of them. However, it does appear both sites are moving slowly toward data portability with two recent initiatives: Facebook Connect and MySpace Data Availability.
In the meantime, I’d recommend just visiting pages in MySpace and Facebook that are important to you in Firefox and taking these steps:
- Click the ‘File’ menu
- Click ‘Save Page As’
- In the box the pops up, make sure the ‘Save As:’ option is set to ‘Web Page, complete’. This will tell Firefox to also download all the images associated with the page to your computer.
- Save the file
Kludgy? Hell yes. Impractical for more than a few pages? Double hell yes. If you host photos or videos on either of these sites that are imporant to you, I’d highly recommend keeping a separate library with the same titles, tags and descriptions somewhere else.
Eventually, I expect these services to open up enough to let people back up their data, but there’s no telling when that will be. The technology is there already, they just need to adjust their terms of service. Please comment if I am wrong about any of this. And sign up at DataPortability.org.
Backing up your profile and contacts in LinkedIn is pretty easy, but you have to do it manually.
To back up your profile, log in and click the ‘View My Profile’ link in the left sidebar. On your profile, you should see an Adobe Acrobat icon. Click it, and you will download a PDF containing all the info you filled out in your profile and the recommendations people have written about you.
To back up your contacts, log in and click the ‘Contacts’ link in the left sidebar. Click the ‘Export Contacts’ link, which is close to the bottom of the screen. Select the format you’d like to save the file as, fill in the captcha text, and click ‘Export.’
I don’t know of a way to back up LinkedIn messages. This doesn’t matter to me because there’s nothing important in there, but if you know of a way please share it in the comments.
I’ve gotten in the habit of writing reviews on Yelp for restaurants I eat at, but I haven’t been keeping local copies of these reviews.
Yelp has a developer API, but like Facebook and MySpace, the terms of service prohibit storing data. If your reviews are important to you, you will want to save copies locally as you write them.
Georgia Podcast Network
This isn’t a how-to, but about future plans.
Sometime this year I hope to build an API into the Georgia Podcast Network that will let people retrieve all their data in XML and JSON. There’s already an undocumented mini-API built in to retrieve new episodes in JSON which is used for the widgets, but there’s still not a method for retrieving everything.
We have no desire to lock down people’s data. We only want people hosting podcasts on the site who want to be there, and podcasters retain all copyrights to their content. I have other features I’d like to add on my wish list, but it is important to me that people using the site feel like they own their content.
Currently, I’m working on upgrading the site from Drupal 5 to Drupal 6, so there’s a freeze on new features until that is finished. Once that is done, I plan to make the API my top development priority.