In a previous post, The costs of storage in the cloud, I attempted to assess the differing costs of storing various amounts of data in the cloud using Amazon S3, Dropbox and Rackspace. I ended by asking:
What is interesting is whether [those prices] can be matched or beaten by academic providers (such as Eduserv) and/or in-house institutional provision, and if so by how much?
So... let's do a quick exercise!
Supposing you wanted to build your own Amazon S3 service? How much would it cost? Could you beat Amazon's S3 prices? And if not, what economies of scale would be necessary before you could?
To answer those questions, let's forget about actual money for a moment. Rather, let's draw up a list of the things that would have to be paid for. We can fill in the numbers later.
So firstly, there's the storage hardware - the raw disks. Of course, it's not quite as simple as that. Decisions need to be made about the kind of storage you want... a fibre-channel based SAN vs. cheaper SATA disks organised into some kind of Network-attached Storage (NAS) for example. Reliability vs. cost will be the issue here. Given that we're trying to build an alternative to Amazon S3, let's use SATA disks. Then there's the chosen architecture. How resilient do you want your storage to be? Amazon offer 99.999999999% durability (which, I think, means they will only lose data 1 time in 100,000,000,000?) with no single points of failure (and a design to sustain the concurrent loss of data in two facilities)... both of which are pretty hard to compete with, so let's relax that requirement a little. Let's say that we want all data to be replicated across 2 separate NAS clusters (meaning that both clusters have to fail before data is lost), running RAID 6 within each cluster (providing fault tolerance from two simultaneous drive failures). With that kind of configuration, providing 1PB of usable disk would require something like 2.7PB of raw disk. (Actually, according to Backblaze's How to build cheap cloud storage, a RAID 6 design based on 15 disks, of which 2 are parity disks, should deliver "87 percent of the raw hard drive totals" as usable space, so we should be able to do a bit better than that).
Then there's the network switching and cabling necessary to join everything together - with, again, decisions to be made about bandwidth and so on. There's also the connection to the outside world to worry about - routers, switching and a firewall for example.
Finally, there's the cost of the physical data centre space to consider, at least insofar as it represents an opportunity cost against doing something else.
That covers the initial investment (which is already non-trivial for any kind of substantial cloud infrastructure), for which costs can probably be depreciated over, say, 5 years.
There are also the recurrent costs...
Energy, both to power the disk arrays and other servers and to keep everything cool, and staffing. Operator cover (perhaps 0.5FTE per 10PB?), some developer effort at least initially (both to keep things running smoothly and to assist in integrating the cloud storage with other systems - let's say 1FTE), a service/project manager (again, let's say 1FTE) and some procurement/financial effort. Again, these are non-trivial sums of money.
So that's my shopping list. What have I forgotten? What have I over-specified?
At this stage, I'm not going to fill in the actual amounts of money - not least because the costs of storage are dropping all the time and the actual price paid will be subject to negotiation. But the shopping list:
- Disks
- Network infrastructure (switching, etc.)
- Router/firewall
- Physical space costs
- Energy
- Operator cover
- Development effort
- Project/service management
- Procurement/financial effort
I'd love to see the figures on this myself and I would have thought physical space costs, air con and the costs of the employees needed to monitor it and swap out dead disks will run it pretty close.
ReplyDeleteAlso are you assuming that the service is used fully or are you keeping all that capacity online to store 20Gb? Thus what is the value for money proposition?
If it is a lot cheaper and there is a large scale demand it begs the question of why it hasn't been done already? I guess the project is designed to answer that one.
If you felt able to share some numbers for the costs on your shopping list more publicly I'd love to see them.
ReplyDeleteA rule of thumb I have heard quoted by people who have recently built machine rooms:
the cost of building and maintaining your physical space is twice the cost of buying the network and server infrastructure that goes inside it.
You seem to have the right mix of elements in there, although how you turn them into a cost model is critical. I might also question some of the figures you have given, although not the basic assumptions.
ReplyDeleteStaff are clearly necessary, but I'm not convinced that a service like this needs a full-time manager AND operator support until it's operating on a very large scale. You do need staff, but getting the number right is critical both to the quality of service and your long-term costs. Staff costs tend to increase in a way that technology doesn't. (Energy costs are likely to grow faster than the staff ones, though.) I've heard figures from one of the large-scale cloud compute providers of 1 operator per 1,000 blades. (About 1.4 hours/year staff time per server.) That's clearly only achievable with well-planned automation of many things. It's not easy to translate to storage, where my guess would be that failures requiring manual intervention are more frequent.
I think you are over-specifying the storage, though, and there's scope here to provide something different to Amazon et al. Not all uses of cloud storage require 2-site replication or high degrees of redundancy or availability. If I am using the cloud to provide off-site resilience for local storage (for instance) I only want one copy, and I can tolerate failure. If you can configure your storage to provide that as an option to some, it could be attractive. We're thinking of DCC services on top of this cloud that would ideally need just that.
It's also dangerous to assume the network is free. JANET is free at the point of use and costs aren't volume related, but at minimum there's an opportunity cost for you, or another institution, in using your bandwidth in this way. Once your link is saturated you start weighing up which services really require it. And we can't assume JANET's model will continue to work the way it has in the past, although I very much hope that it does continue. One of the best decisions taken about national IT infrastructure in the UK ever.
Andy;
ReplyDeleteI love what you're doing here but I think you're painting over the amount of cost associated to operations by just placing a line item for "Operator Cover".
In order to cover all the technologies you've listed above, you either have an expensive IT organization with separate skillsets (security engineer, network engineer, storage, etc)...or you have one or two crackerjack engineers which leave you back at a single point of failure.
These costs are generally not trivial.
I appreciate you for all the valuable information that you are providing us through your blog.
ReplyDeleteStorage Buildings
I was very pleased to find this site. I definitely enjoyed reading every little bit of it and I event staffing agency have it bookmarked to check out new stuff posted regularly.
ReplyDeleteI find many useful things herewitheventstaffing
ReplyDeleteThanks for sharing this article, it has been a really helpful read. I've never dealt much with storage in Scarborough. I was going to spend the money for a nice unit but my brother in law builds stuff like this and said he would help build on with me, and to top it off, he's not making me pay! I'm just so happy and thrilled!
ReplyDeleteThis is awesome! I've been looking for some quality metal storage buildings in North Carolina, and this gave me some great insight of what to look for when choosing one. Thanks for sharing!
ReplyDeleteA regret that we have heard from past customers is they wished they had thought more long range and purchased a building that could have housed their future needs as well.
ReplyDeletelittle barns
wow... what a wonderful site.. i really impressed this site.. great work.. after read this site i get some new knowledge.. This is the perfect blog for anybody who hopes to understand this topic.
ReplyDeleteCitrix Thin Client & Linux Thin Client
It’s difficult to find educated people in this particular topic, but you sound like you know what you’re talking about! Thanks promo glutera
ReplyDeleteThe cost of storage in Lota is same for any climate condition. They always try to offer proper, convenient and best storage services to people from their end. That is why they are the best service providers among all other storage units of my city.
ReplyDeletecara mengatasi penyakit secara alami
ReplyDeleteObat Herbal Insomnia Akut Obat Herbal Radang Tenggorokan Paling Ampuh Obat Herbal Hernia Tanpa Operasi Obat Herbal Vitiligo Ampuh Obat Herbal Gagal Ginjal Tanpa Cuci Darah Obat Herbal Gendang Telinga Pecah Obat Herbal Leukemia Akut Obat Herbal Kanker Otak Stadium 3 Obat Herbal Amandel Tanpa Operasi Obat Herbal Hepatitis B Paling Ampuh Obat Herbal Kelenjar Tiroid Paling Ampuh Obat Herbal Kista Mujarab Obat Herbal Jantung Koroner Paling Ampuh Obat Herbal Penyakit Chikungunya Obat Herbal Kanker Prostat Tanpa Operasi Khasiat Ace Maxs
semoga bermanfaat
ReplyDeleteobat herbal benjolan diketiak ampuh
semoga menjadi solusi bagi anda yang ingin mengobati penyakit yang anda keluhkan
ReplyDeleteobat herbal ISPA
terimakasih sudah berbagi ilmunya, greeting
ReplyDeleteobat herbal tetanus tradisional
terimakasih sudah berbagi informasinya
ReplyDeleteobat herbal vertigo akut
semoga menjadi solusi bagi anda untuk mengobati penyakit yang anda keluhakan
ReplyDeleteobat herbal amandel kronis tanpa operasi
This article really helped me to understand what I do not know about the above. for the content of the article is very interesting and is able to read with ease. I hope this article can be useful to others as well.moment glucogen
ReplyDeleteThanks for all this site share sir.. Terima Maklon Surabaya
ReplyDeleteI’m not that much of a online reader to be honest but your blogs really nice,jerawat pasir
ReplyDelete