Category Archives: Uncategorized

Help! I can’t find any female speakers for my tech conference!

I occasionally get asked where one could go to find women speakers for tech events and conferences. Here is short list from the top of my head:

Rails Bridge (for Ruby)
pyladies (for Python)
flossie (forFree/Libre Open Source Software)
JDuchess (for Java)
LambdaLadies (for functional programming)
MzTek (technology and art)

Also on twitter

@anettebgo (javascript) UK
@annashipman (software craftsmanship, Java, Ruby, JavaScript, Perl, development methodology and clean code) UK
@fhr (my boss on R, big data and DataKind) UK
@jennifersmithco (clojure, neo4j & graphs) UK
@jenit (open data) UK
@lunivore (bdd) UK
@mia_out (Digital Humanities, web dev) UK
@misslake (Art and programming and senior TED fellow) UK
@trisha_gee (Java, Java Performance, MongoDB) UK
@yoditstanton (big data, clojure, hadoop) UK

@bodil (fp, clojure, clojurescript, javascript, web dev) Norway

@antiheroine (web dev) US
@ashedryden (ruby & general diversity) US
@carinmeier (clojure and fp) US
@kaythaney (Mozilla Open Science, Big Data and DataKind) US (formerly UK)

And I’m sure quite a few more, but that is a start. Please add more in the comments

I’m sure @ashedryden has a much more complete list on her blog as she is writing a book on this.

Retrofit is go!

We’ve got a nice project which went live today – we’ve been providing the final performance analysis for the £17m government-funded Retrofit for the Future programme, which has been running since 2009. The programme retrofitted 120 homes in the UK with multiple energy-saving and energy-efficiency technologies, and collected data through over 1300 sensors and meters taking readings every 5 minutes. The aim of the programme was to inform the planning and development of retrofit and new-build homes to help the UK meet its energy goals. They launched the final results at Ecobuild today, and they’re also up on this website.

As you might imagine, that many meter readings accumulate into a dataset which is pretty big, and it’s been a lot of fun trying to wrangle it into shape. The sensors have come in all shapes and sizes, and all levels of reliability, so (as always) a major part of the work has been in cleaning and making sense of the data, before actually being able to do a lot with it. The good news is that the results are great – an average reduction in emissions of over 60%, with some homes reducing by more than 80% (fivefold!). There’s also a load more we could do with the same dataset – we haven’t even begun to dig into the behavioural and other factors that are revealed by this very fine-grained information.

So for now, take a look at the report, and you can register to gain your own access to the raw data at https://est.amee.com/. We’ll be releasing the analysis code shortly, as well. 

retrofit-image

What happens when a project gets into the press

Prescribing Analytics got an awful lot of positive press coverage over the last few weeks.

It’s very exciting to see something we’ve made be covered so extensively – it was on the front page of the Independent, on the PM programme on Radio 4 (47 mins in), the Economist, the FT, and even in the Revolution Analytics R blog as well as various medical and pharma blogs.

It’s been interesting the different ways this piece of work has been interpreted by different journalists. In the text on the site, and when being interviewed, we were very careful to emphasise the context of the data and that NHS prescribing advisers already do a lot of work on this issue. Most of the responses we had seemed to understand this very well, and it’s been great to see how many NHS employees have been in touch both to say that they enjoy it and to suggest future developments that would be helpful. We did get a few sensational headlines, but most of the coverage was brilliantly balanced, with Radio 4 and the Economist being especially impressive in the care they took to ensure they portrayed the issue fairly.

Anyway, the amount of value that this small project has been able to create makes it pretty clear that there’s a lot more to be done with public health data. It also makes me realise that people loooove maps as a data visualisation. Lesson for every project there, fit in a map if you can – it’s fun and interesting to play with, and people pay much more attention to the resulting data as a result.

It lives!

We’ve finally published the prescribing analytics portal here

It covers prescribing variation of statins in the UK last year, making it really easy to explore variation which could be costing extra money. I definitely recommend taking a look – it’s fascinating to explore.

It got some great coverage too – the Financial Times, the Economist, and Huffington Post all had approving mentions.

Last week, the NHS Information Centre released another quarter of a billion rows of data, so there’s a lot more to be done here, and this kind of big data is exactly our bag. Watch this space!

Big data, simple analytics, big impact.

Don’t tell anyone yet, but we’ve got a big project going live soon which aims to help the NHS save an awful lot of money. It turns out that by altering GPs’ prescribing behaviour for a few drugs, swapping generic for proprietary forms where appropriate, it’s possible to save hundreds of millions of pounds a year – working with the doctors at Open Healthcare UK, we’ve clarified how prescriptions can and should safely be changed, done the detailed financial analysis, and created maps and rankings of exactly which GPs are spending what.

The full details will all be published in due course, but actually doing the project has reminded me of a few basic principles that come up again and again:

1. Good domain knowledge usually beats super-smart algorithms. Working with qualified doctors and using their understanding of the ins and outs of prescribing behaviour was massively more important to this project than the mathematical and analytical aspects.

2. The effort of getting data doesn’t necessarily correlate with its importance. This whole project was based on open data from the NHS Information Centre, free for anyone to download – acquisition was easy, it’s actually doing something with it that was the harder part.

Big Data, told through the medium of The Muppets

From the Bethnal Green Ventures Demo Day – my 5 minute version of what we do, told mostly through the medium of Muppets.

Not the most technically deep talk I’ve ever done, but I’m actually pretty pleased with this as a very quick teaser on what we do and what it’s for.
Now to work on the song and dance number.

When, will I, will I be famous?

(if you now have an earworm, congratulations, we are of the same generation)

In October you’ll have a few chances to see the very rare and shy Team Mastodon in public. 

On October 2nd, Fran will be talking at O’Reilly Strata London on How to Make Big Data Massively Greener (contact us for discounted tickets for the whole event)

On October 4th, we’ll be showing off all the awesome stuff Mastodon C do at the Bethnal Green Ventures Demo Day at Google Campus (free tickets, contact us for an invitation).

On October 16th, Bruce will be talking at Big Data Con London (unfortunate name there) on Big Data in the Cloud with Clojure

See you there!

Hive pain reduction tricks

Hive is a SQL-like interface onto Map Reduce. It feels nice and familiar to analysts who are used to thinking in a SQL paradigm, but it has some nasty gotchas that can make jobs verrrrrry slow or make them fail altogether. Either way, you waste a lot of time, blood pressure, and machine hours.

I went to a great talk recently by Philip Tromans at the London Hive meetup which covered some very useful Hive Optimisation tips. His full deck is here, but I’ve shamelessly recopied a couple of the most useful points here:

Continue reading