Wednesday, 8 December 2010

The costs of storage in the cloud

As part of beginning to think about the business model issues around FleSSR (one of our deliverables for next year) I've been estimating the costs of storing various amounts of data in the cloud using Amazon S3, Dropbox and Rackspace.

I've done calculations for all 3 services for between 50GB and 500PB of storage for one year.

To make the costs realistic I've had to make some working assumptions about how much data might be shipped across the network, as well as the costs for storage. On that basis, I've arrived at a total cost by starting with the costs of storage for 1 year, adding the costs of uploading all the data (spread evenly over the year) and then adding a little bit more network traffic to simulate local re-use of the data (1% per month). In the case of Amazon, where IO operations are part of the charging model, I've also had to make some assumptions about the average file size - which I've set to 1MB. So, in the case of, say, 1TB of data, I've costed the storage of 1TB in the cloud, plus the costs of uploading 93GB (1/12th of the total data plus 1%) per month, plus the costs of downloading 10GB (1% of the data) per month plus the costs of 93,000 upload operations (PUT requests) and 10,000 download operations (GET requests) per month.

Clearly the 1% figure is a complete finger-in-the-air job - I have no idea if it is reasonable or not - but I've intentionally set it quite conservatively. It is also worth noting that any scenario involving more than 500TB data storage probably has to consider how the data is uploaded to the cloud in the first place (other than by using the network), so my calculations probably go a bit wrong in those cases.

My costings come out higher than similar estimates that I've seen from other sources... I think because people tend not to include the costs of transferring data across the network when they do this kind of thing. Network costs are actually quite significant in terms of the overall price.

Prices are based on the available information on the web (Amazon S3, Dropbox, Rackspace). Note that Rackspace's pricing includes support and a Content Delivery Network (CDN) and so isn't directly comparable with Amazon S3. Also that the largest offer from Dropbox is for 100GB of storage, so that service isn't relevant for most of my data points.

A Google spreadsheet of my workings is available. Please let me know if you think I've made any mistakes.

The point here is not to make comparisons between these three services - please don't use my numbers to do that. Indeed, making such a comparison based only on cost would be rather foolish because there are significant differences between the services in other ways. Rather, it is just to get a feel for how the different charging models work and, more importantly, to get a feel for what we are up against as we think about transforming FleSSR into a production service.

So, what can we conclude? Looking at the cost per TB per year, the Dropbox and Rackspace prices are pretty much flat (i.e. the same irrespective of how much data is being stored) at around £1530/TB/year and £1220/TB/year respectively (though, as noted above, the Dropbox prices are only applicable for 50GB and 100GB). Amazon's pricing is cheaper, particularly so for large amounts of data (anything over 100TB data where the price starts dipping below £1000/TB/year) but never reaches the kind of baseline figures I've seen others quote for Amazon storage alone (i.e. without network costs) of around £450/TB/year. (My lowest estimate is around £510/TB/year for 500PB data but, as mentioned above, this estimate is probably unrealistic for other reasons.)

Superficially, these prices seem quite high - they are certainly higher than I was expecting. What is interesting is whether they can be matched or beaten by academic providers (such as Eduserv) and/or in-house institutional provision, and if so by how much?

I'll return to that question in a later post.


  1. The costings aren't surprising, and you're absolutely right to highlight the difference in costs that arise from putting stuff in and getting it out, as opposed to just storing it. Amazon have built on the marketing model of many other storage service vendors here, including those that offer physical storage of paper records (or digital media) where there's a low per-month charge for shelf space used, and per-event charges for pickup & delivery that are much higher. I guess it works because not enough people do the sort of modelling that you've done, even though it's not hugely difficult.

    For those trying to make comparisons with local provision, it's also worth observing that although some models for real costs of local storage include ingest costs, few include the cost of infrastructure necessary to get the data back. But then again, we need that network infrastructure whether our data is local or in the cloud, so perhaps that isn't important.

  2. I used Dropbox before and I haven’t realized that our company is paying this amount for their services. I know it’s kinda high but I was actually satisfied with the aid that it had given us. All of our important data were shared and stored there and we have no problems regarding that. :D

    Manda Maldanado

  3. There are times when a user thinks more of the cost rather than the quality of the service of the cloud management service. What matters most is how the data management system can keep your files secure and confidential. It’s also the time when we need to invest in these systems so that we can also be assured that our documents are in safe hands. :)

  4. Shelling out for security is definitely worth every penny, especially when you’re dealing with trusted IT administrators. However, be careful on who you trust. There are a lot of IT services available, you just have to choose who'll perform at their best, at a deal that won't hurt your budget. :)