Tag Archives: cloud footprint

Green data in the UK – nice job Rackspace!

We added Rackspace calculations to our footprint data this week, as we’re keen to start running our Hadoop clusters over more infrastructure (in the greenest possible way) and we’re sure you are too.

I was getting a bit despondent about our sustainable options for running big computation, so was super excited to learn after talking to Rackspace and to Greenpeace that Rackspace have actually entered renewables purchase agreements for all the power in their London data centre, and that Greenpeace are also happy to sign off on those figures (Greenpeace are especially tough on greenwashing, so if they say it’s legit, we’re pretty happy). New ratings are here. The big green blob over the UK makes me smile.

Greenmonk just published a blog post about the same thing, praising Rackspace for being leaders in this area. We totally agree.

So Rackspace London rocket up the leaderboard as a green option, and we can build moar big data in the UK without doing bad things to the planet! Happy campers all round.

Doing the data centre dance

We show live footprint estimates for right now on our dashboard at http://www.mastodonc.com/dashboard, but it’s also pretty interesting to visualise the different locations over time to get a feel for the size of fluctuation to see how the horse race between time zones and temperatures plays out.

This chart shows some real ratings for May. It’s got some really interesting features.

Iceland (powered by geothermal energy) stays at a flat zero all the time.

Ireland has a pretty interesting profile – it’s cold enough to air-cool most of the time, although has a pretty crappy emissions factor on its electricity, so its footprint is not good but is stable (as long as it doesn’t get warm, which starts to happen at the end of the chart).

Sao Paulo is hot, so its footprint has a very clear diurnal cycle which tracks temperature and hence cooling power draw, but it’s on a good power source so the amplitude is small.

Data center carbon footprints gradually go mainstream

Yet another GigaOm article with a linkbait headline but genuinely interesting content, which is well worth a read – “The controversial world of clean power and data centers”.

It’s great to see this issue becoming increasingly mainstream, and in particular businesses starting to be more and more clear about the fact that being efficient is distinct from being green – and that clean power is a much bigger part of the carbon footprint battle than efficiency. They’re also completely on the money in pointing out that data centers need to run on baseload power (ie high and consistent levels) – which you can get easily from coal or from hydro or geo energy but not so easily from wind or solar, meaning that wind/solar powered DCs are not likely to be so realistic and scalable, even though solar panels on the roof are often put in for good PR.

The article does miss one major trick, though – it says that tech companies have to put their data centers near their customers, which is not always true. The exciting part about cloud computing and IaaS is that, for a lot of applications – like data processing – the latency to pretty much any well-connected facility in the world is low enough, which means that the data centers can follow the available green power and can be put down right next to baseload-type green power sources.

Countries’ power grids differ wildly in their footprints

I showed in my post last week that the biggest determinant of good vs bad carbon footprint for a data centre is what power source it runs off.

If you want to pick a good data centre, therefore, you want to pick one that’s on a good power grid (most data centres run off their local public grid most of the time). Some high-level examples of power grid footprints are in the chart below, supplied by our friends at Amee. As you can see, the grids differ a *lot* in their footprints. The UK, where we’re located, is okayish but not great; in line with the US average, although within US states there’s a wide spread as well. Iceland, Switzerland, and Sweden are all very good, if you can manage to get servers there.

Of course, you can just run jobs through our services and we’ll take care of the detail of how to minimise the carbon footprint. But it’s still good to understand why we’re doing what we do.

80% of bad footprint is down to power source and climate

I’ve heard some very confident but completely conflicting claims lately about what is ‘the important thing’ in the carbon footprint of compute jobs. The dissonance was starting to get to me, so I’ve done some back-of-envelope calculations of different scenarios to figure out what the order of importance of the factors actually is. I soft-pedalled the impact of power source as much as possible, since a priori I thought that was the most important one and I didn’t want to be biased.

The results were really interesting, and not only because they show that I’m right :)

60% of a bad footprint is down to power source, and 20% is down to climate. The remainder is factors like physical design and occupancy; ironically, those smaller ones are the factors that people tend to highlight when talking about ‘green’ data centres. This waterfall chart breaks it down:

Here’s how I calculated the numbers shown in the chart – these are all approximations, but I’m confident they’re in the right ballpark. I would be particularly interested to hear any adjustments to the model that you’d make.

Start with a good footprint baselined at 1

PUE of different physical designs: Good case 1.2 (this is very good: the superstar PUE from OpenCompute is claimed as 1.08) bad case 1.9 (this is very bad since we’re assuming no climate or occupancy impact yet)

Physical design PUE impact 1.9/1.2 = 1.58 times impact 

Occupancy: very good case 100%, bad case 50% (assuming a fairly good worst case as a public cloud operator with very low occupancy is probably going to go out of business). Impact 1.22 times, read off from chart at the British Computer Society Data Science Simulator http://dcsg.bcs.org/welcome-dcsg-simulator

Cold and dry to hot and humid climate 2.0 times impact – assume massive extra air conditioning load

Power source assume bad case UK power grid, good case Sweden power grid. Deliberately understimated this one (e.g. many US regions are much worse than UK, and Iceland is much better than Sweden) to make sure I don’t overstate the case. 8.0 times impact according to Defra international energy figures.

Overall bad footprint 1 * 1.58 * 1.22 * 2.0 * 8.0 = 30 times worse than the starting ‘good’ case.

To convert multiplicative impacts into percentage shares, take logarithms.

End result:

61% is down to power source

20% climate

13% physical PUE factors

6% occupancy

That’s some pretty dramatic stuff – and it means, thankfully, that our model of estimating footprints based on external climate and power source data without always having access to physical PUE and occupancy data is heading the right way. Good news for us – we can indeed have a big impact with carbon ratings without the big providers playing ball!

Thoughts, comments, very welcome.

An open standard for cloud footprints?

Well, this has been an exciting week.

We had All The Meetups last week (thanks Carlos and Stewart for an excellent set of Big Data Week events). This week was going to be quieter, but turned into the week of All The Meetings; everyone we spoke to had someone else even more eminent that we needed to be introduced to.

Turns out that there’s tremendous appetite in parts of the industry for getting real, actionable data on cloud usage footprints into the hands of users. This has been a very happy surprise to us – we were especially excited that the founder of Joyent committed this week to getting data out into the world on this very thoughtful panel discussion which Tom Raftery at GreenMonk participated in and pointed us to. We think that maybe (just maybe) there is scope for putting together an open standard for ratings, forming a coalition of hackers, carbon specialists, and cloud providers, and getting it out as widely as humanly possible. Watch this space – and get in touch if that coalition is something that you’d like to engage with.

And the final win of the week – we have got an office! The novelty of working from home palled extremely quickly, and so we have taken some space from the lovely folks at Club Workspace, which I can highly recommend so far.

Transcript of Saving the world with Hadoop – talk at London Hadoop User Group

This post is the transcription of a talk given in April 2012 by Francine Bennett of Mastodon C (www.mastodonc.com). View the Live Carbon Ratings of the data centres here.

[the legal bit] Our data and analysis are based on assumptions and represent opinions not facts. We aim to update our models as better data becomes available.

Hi. My name’s Francine Bennett, and I’m here to talk to you about a less technical topic than usual, but one that’s still very important: how to save the world with Hadoop.

I’m the cofounder of Mastodon C, a new clean-tech company which builds green cloud computing tools, and other than that am mainly a data nerd. I used to work with Big Data at Google, using their internal MapReduce stuff, and I was very sad not to get to play with those tools any more when I left, which is why I think Hadoop is awesome. Taking that awesomeness as read, I’m going to explain some things which you can do to save the world while using Hadoop.

So, why am I talking about saving the world? It starts with this chart:

This shows carbon emissions on the Y axis and time on the X axis. Most critically it shows how those emissions have exploded over the most recent decades.

Many extremely credible scientists and agencies have pointed out, with increasing urgency, that with current behaviour we’re getting locked into a carbon emissions path which is likely to lead to catastrophic global events. The latest figure on this is the International Energy Agency models, which say that we need major change by 2017 in order to keep global warming down to a we-might-be-OK level.

“The world is locking itself into an unsustainable energy future which would have far-reaching consequences” 

International Energy Agency, November 2011

So what has any of this got to do with Hadoop?

That’s where this graph comes in.

Source: The Economist

As awesome as on-demand cloud services are in terms of functionality, there’s a problem with them. The data centres consume a lot of power, and some of the biggest facilities are running on some of the worst power sources possible. In particular, AWS’ biggest facility, in Virginia, which is estimated to hold 70% of their global servers [source], is located on a coal-heavy power grid [source]. By the way, I’m focussing on AWS for this talk, but the points I’m making apply to all providers.

The chart above shows the explosion of use of Amazon Web Services; the number of virtual machines on AWS is roughly tripling each year. If you are using Hadoop at scale, you may well be using Elastic MapReduce on AWS; you’re almost certainly using some kind of public cloud facility, as it’s the most sensible way to access large clusters of machines for a short time, and do the really super high powered Big Data stuff that you need to do.

Copyright Bruno and Ligia Rodrigues

Carbon footprint is not something which most public cloud providers are keen to discuss. Dirty energy often doesn’t cost much, and power is one of the biggest costs of a data centre, so it can be good for the P&L to run on dirty energy. Instead, many providers would much prefer to keep ‘green’ discussion on the topic of efficiency and PUE (power usage effectiveness), which describes the proportion of a data centre’s power usage which goes directly into powering the servers, and which aligns pretty closely with their costs. The thing is, being power-efficient is not the same thing as being green. If you run an uber-efficient data centre on coal, it’s still an environmental calamity. If you run it on something like geothermal power, the power consumption and the cost of operation can be similar, but the environmental profile totally different.

Copyright ThinkGeoEnergy

So, the carbon footprint of a server is determined by a number of independent multipliers: physical design of the data centre, environment around it – in a hot and humid environment, a compute job can require up to twice as much energy as the server needs to be cooled, and the other factor that I’m highlighting here and which providers prefer not to expose, the power source.

There are three reasons why this is particularly relevant to Hadoop users:

  • Hadoop work is intermittent anyway, so nothing breaks if you point it to a different place – jar files are standard and portable
  • Hadoop jobs are typically not as latency-sensitive as something like web-hosting, which means that you have more choice of where to run them – you don’t need to use the closest facility
  • A big map/reduce job uses a lot of machines – typical cluster size of around 30, and some people are going up to cluster sizes of hundreds. Next I’ll get onto what that number of machines represents in terms of energy.

As an individual in the UK, your personal annual carbon footprint is about 11 tonnes. A large server draws up to half a kilowatt of direct power and up to an additional half kilowatt in cooling, and the emissions related to this power usage depend on the power source. [server and cooling power estimates vary. Example source, Example source]

If we take AWS Virginia in US East, the most common location, as our baseline, we estimate using these figures that your personal emissions equate to those of 3 servers running fulltime in Virginia.

Staying within AWS, let’s take a look at Oregon in US West, which is priced identically to US East. Using data on the energy grid there, we estimate that Oregon’s emissions are more than 3x lower, so that same personal footprint equates to an estimated 11 servers in the Oregon zone.

So, as an individual orchestrating large Hadoop clusters, you can see that you have a way bigger impact by making informed choices about where to send compute work, than by making any amount of change to your personal life.

Looking further afield, there are also totally renewables-powered providers such as Greenqloud available, which in a sense would make this chart give you infinitely many servers in ratio to your personal footprint. However, right now they don’t have a tool like Elastic MapReduce available, which makes anything non-AWS unattractive to many Hadoop users, and is why I’m focussing on Amazon in particular.

So, here’s a full ratings picture of Amazon locations, ranked by estimated emissions per server hour. Amazon are extremely quiet about anything to do with their power draw or carbon footprint, so we’ve calculated this using data on those power grids [source] and from public data on server power usage. As you can see, Virginia, the most popular location, is right up at the top end for modelled emissions. As I pointed out earlier, Oregon’s down nearer the bottom. You’ll also notice that we have both a ‘hot’ and a ‘cold’ rating for each place. These represent the modelled emissions for the maximum local temperature – summer daytime, pretty much – and the minimum local temperature – midnight in winter. As you can see from this, there’s also carbon value in ‘following the moon’. We also just launched a live ratings model using current temperatures for all of these on our website, so please make use of that.

The thing that’s really exciting about this situation, is that people have written speculatively and done academic work on ‘follow the moon’, green compute grids and so on for a number of years, but this is the first time it’s been realistic to make it happen for the vast majority of users. Because of the global ubiquity of cloud resources, it’s become almost trivial to shift compute work around, particularly if you already have your data backed up to a number of zones – which is in any case good practice for business continuity.

Not only is it easy, it’s also inexpensive to make different choices – this chart shows the cost vs estimated emissions, again just covering AWS. As you can see, Oregon and Virginia are equally priced, with Oregon having much lower estimated emissions, so if you’re locked into AWS with immobile data, I’d urge you to at least choose Oregon  as your default. If your processing is more mobile, please ping our API for live ratings on where to send jobs, and please let us know if there are other providers you’d like us to add to the list. And if you can contribute additional data to this effort, please let us know – it’s very hard for us to get direct access to accurate information from inside the providers’ walls, for the reasons I mentioned before, so we would like to use shared expertise to improve our precision.

Before I finish, I’m just going to zoom back out again and give you a sense of the overall potential scale of this. IT was responsible for 2% of the world’s carbon footprint in 2007, and that is projected to double to 4% by 2020 [source]. At the same time, it’s projected by some that 50% of the world’s data will be stored and processed in Hadoop by 2017. Let’s say we could knock off ¼ from those 4% IT emissions – not such a crazy assumption, given that just within a single provider’s locations we calculate a greater than 10x range in emissions, and as we’ve discussed Hadoop is highly portable. Then that would mean 1% reduction in global emissions. That’s a pretty huge deal.

So, here’s what I’d like you to take away from today:

  • Hadoop and cloud computing are awesome, but some default settings have an ugly carbon footprint
  • Carbon footprint and PUE are not at all the same thing. Efficient != green. If a provider tells you about their excellent efficiency and low PUE, please ask them difficult questions about their power sources.
  • And, the main point: you, personally, have the ability to massively reduce the carbon footprint of your dev work by making informed choices of location and provider. So do it.

And we’re off!

Today is the first day of full time work on Mastodon C for the whole team (all two of us).

Our first big goal is to get a fine-looking demo and the free cloud ratings service up. It’s Big Data Week next week where we’ll be doing plenty of talks about green cloud computing, and hopefully getting people to our website to learn more. Code code code!