Flexible Services for the Support of Research

Thursday, 9 December 2010

Date for your diary - UCISA-IG/NG cloud event, 16th Feb 2011

David Wallom (OeRC) and Matt Johnson (Eduserv) will be speaking about the FleSSR project at the UCISA-IG/NG Seminar on Cloud Computing at Holywell Park, Loughborough University on Feb 16 next year.

Wednesday, 8 December 2010

The costs of storage in the cloud

As part of beginning to think about the business model issues around FleSSR (one of our deliverables for next year) I've been estimating the costs of storing various amounts of data in the cloud using Amazon S3, Dropbox and Rackspace.

I've done calculations for all 3 services for between 50GB and 500PB of storage for one year.

To make the costs realistic I've had to make some working assumptions about how much data might be shipped across the network, as well as the costs for storage. On that basis, I've arrived at a total cost by starting with the costs of storage for 1 year, adding the costs of uploading all the data (spread evenly over the year) and then adding a little bit more network traffic to simulate local re-use of the data (1% per month). In the case of Amazon, where IO operations are part of the charging model, I've also had to make some assumptions about the average file size - which I've set to 1MB. So, in the case of, say, 1TB of data, I've costed the storage of 1TB in the cloud, plus the costs of uploading 93GB (1/12th of the total data plus 1%) per month, plus the costs of downloading 10GB (1% of the data) per month plus the costs of 93,000 upload operations (PUT requests) and 10,000 download operations (GET requests) per month.

Clearly the 1% figure is a complete finger-in-the-air job - I have no idea if it is reasonable or not - but I've intentionally set it quite conservatively. It is also worth noting that any scenario involving more than 500TB data storage probably has to consider how the data is uploaded to the cloud in the first place (other than by using the network), so my calculations probably go a bit wrong in those cases.

My costings come out higher than similar estimates that I've seen from other sources... I think because people tend not to include the costs of transferring data across the network when they do this kind of thing. Network costs are actually quite significant in terms of the overall price.

Prices are based on the available information on the web (Amazon S3, Dropbox, Rackspace). Note that Rackspace's pricing includes support and a Content Delivery Network (CDN) and so isn't directly comparable with Amazon S3. Also that the largest offer from Dropbox is for 100GB of storage, so that service isn't relevant for most of my data points.

A Google spreadsheet of my workings is available. Please let me know if you think I've made any mistakes.

The point here is not to make comparisons between these three services - please don't use my numbers to do that. Indeed, making such a comparison based only on cost would be rather foolish because there are significant differences between the services in other ways. Rather, it is just to get a feel for how the different charging models work and, more importantly, to get a feel for what we are up against as we think about transforming FleSSR into a production service.

So, what can we conclude? Looking at the cost per TB per year, the Dropbox and Rackspace prices are pretty much flat (i.e. the same irrespective of how much data is being stored) at around £1530/TB/year and £1220/TB/year respectively (though, as noted above, the Dropbox prices are only applicable for 50GB and 100GB). Amazon's pricing is cheaper, particularly so for large amounts of data (anything over 100TB data where the price starts dipping below £1000/TB/year) but never reaches the kind of baseline figures I've seen others quote for Amazon storage alone (i.e. without network costs) of around £450/TB/year. (My lowest estimate is around £510/TB/year for 500PB data but, as mentioned above, this estimate is probably unrealistic for other reasons.)

Superficially, these prices seem quite high - they are certainly higher than I was expecting. What is interesting is whether they can be matched or beaten by academic providers (such as Eduserv) and/or in-house institutional provision, and if so by how much?

I'll return to that question in a later post.

Wednesday, 24 November 2010

OpenStack Design Summit Website Launched

Although the FleSSR project has no current plans to use it, the emergence of OpenStack is a very significant development in the open source cloud space and therefore something worth keeping an active eye on. Videos and other material from the recent OpenStack Design Summit are now available through the summit website.

OpenStack is a collection of open source technology products delivering a scalable, secure, standards-based cloud computing software solution. OpenStack is currently developing two interrelated technologies: OpenStack Compute and OpenStack Object Storage. OpenStack Compute is the internal fabric of the cloud creating and managing large groups of virtual private servers and OpenStack Object Storage is software for creating redundant, scalable object storage using clusters of commodity servers to store terabytes or even petabytes of data.

Friday, 5 November 2010

Community clouds

A presentation about community clouds, using FleSSR as a case study, given by Matt Johnson and Ed Zedlewski of Eduserv to the UCISA CISG 2010 conference in Brighton during November 2010:

Community Clouds

View more presentations from Eduserv.

Monday, 1 November 2010

Beginning to build our public infrastructure

Matt, Matteo and I spent a day at Eduserv's Swindon Data Centre last Wednesday, getting the first phase of FleSSR's public cloud up and running.

Matt (on the left in the photo) had previously put the hardware in place, leaving Matteo to lead on installing Eucalyptus on the Cluster Controller (the only public-facing part of the infrastructure and the box which controls everything else in our cloud) and the first two Node Controllers (the boxes on which any virtual machine instances requested by end-users will run).

My job was to make the tea, take a couple of photos and write this blog post!

Because this cloud sits in the Eduserv data centre, which also hosts a variety of other services (some of which are sensitive) we had to partition the cloud machines onto a completely new network, firewalled off from everything else. This led to a couple of brief hiccups in the installation process because of the lack of both an existing gateway machine and a DHCP server on that network.

It turns out that the Cluster Controller acts as a DHCP server and gateway for all the virtual machine instances created by the Node Controllers but not for the Node Controllers themselves. This took us slightly by surprise. The Node Controllers need access to the Internet in order to download Ubuntu patches, hence the need for a gateway... no gateway, no patches :-(. However, rather than assigning one of our limited number of available machines to run as a gateway (you can't use the Cluster Controller as the gateway because the Eucalyptus software takes over control of the routing and NAT tables and allocates everything dynamically) we got round the problem by running a Tinyproxy server on the Cluster Controller and routing HTTP requests from the Node Controllers out thru that. We also circumvented the need for a DHCP server by manually assigning IP addresses to each of the Node Controllers.

Apart from that, the rest of the installation went very smoothly and by the end of the day we had a Cluster Controller and two Node Controllers up and running smoothly. A brief test of the Web interface indicated that we could instantiate virtual machines on the Node Controllers without any problems.

We still have to install Eucalyptus on the remaining three Node Controllers and put in place our FAS 3140 SAN cluster, of which we'll use about 10 Tbytes for FleSRR, kindly loaned to us by NetApp via Q Associates. The loan of this kit is very much appreciated.

This work should be completed this week. Then we can move ahead with properly testing our cloud.

Wednesday, 27 October 2010

Introduction to FleSSR

David Wallom, talking about FleSSR at a JISC Programme meeting.

Thursday, 16 September 2010

FleSSR at the EGI Technical Forum

FleSSR is at the EGI Technical Forum in Amsterdam presenting a poster of the project.