Reflections on an internship

Our latest superstar intern, Max, has just left us to go back to his last year of university at Queen Mary University of London.

He’s been working with us on server automation and on separating out our code base into better modules – heavy work, and with a steep learning curve, but important work for the company. Happily, he’s done a brilliant job. We asked him to write up some thoughts on how he found it, and what he did and didn’t like – and we thought you’d like to read them too.

From the start, I felt very included. Even as an intern I was never left out of anything and that was my first big impression of the team at MastodonC. I remember in my first week when everyone was moving to the meeting room for democake, I was saying that I have nothing to show. But Anna explained that didn’t matter, and I could just talk about what I had learned.

During my internship, I was given the opportunity to work on many different and interesting tasks. From initially learning the basics of Clojure and getting the chance to do some ClojureScript too, all the way to creating virtual machines to allow certain tasks to be run locally for testing. I learned a new way of thinking with Clojure which was different to all of my previous OO experience, and using virtual machines locally to run servers was completely new to me too. In the end I eventually managed to conquer setting up a virtual FTP server, which took me far too long!

A lot of my work over the summer involved writing tests for existing code. I had to check whether given inputs would match a schema. However, the team wanted to tests these in massive numbers, not just a couple of hard-coded tests. So I got to learn about generative testing using Clojure’s test.check library. Through this I got the chance to write my own small Clojure library. The team wanted to generate test data from existing schemas built from the Prismatic Schema library. After a lot of researching I couldn’t find anything that would do this well enough, so I had a go at creating my own. It is completely open source and can be found (and added to!) on GitHub. Doing this showed me how friendly and helpful the Clojure community is, when I had problems I could ask questions on specific Google Groups or the more general London Clojurians, as well as various IRC channels.

Now that it has finished, I definitely miss everyone at MastodonC and I would happily work there again if I get the chance to. Thank you so much for a fun, interesting and rewarding time!

Thank you Fran for being a great boss, who I could talk to about any questions I had. Thank you Bruce, for making me into a person who paredit is for. Thank you Neale for helping me with all of my Git mishaps and showing off ridiculous Emacs commands. Thank you Anna for coercing me into going to Clojure Dojos.

Thanks very much Max for spending time with us – it’s been a great experience, and we hope to get you back someday.

Open Health Data platform is launched

We had the big launch this week for the Open Health Data Platform, at Shoreditch Village Hall. It seemed like the audience agreed with us that there was plenty of good stuff to be done with open health data:

We’ve built the platform on behalf of Connected Digital Economy Catapult, along with Error Creative Studio who did the work to make it beautiful. The Platform is really a collection of examples, how-tos, and case studies for using open health data to build useful analytics and applications: everything there, including both code and design, is open source and open access, and intended for people to reuse and remix for their own purposes.

The Catapult are now really keen to take this forward with more examples, and to support health tech entrepreneurs as well as bigger organisations and the public and voluntary sectors to do good things with this data, so if you’re interested in the area you should definitely stay in touch.


Network analysis to find ‘innovators’

We’re midway through some work with Nesta to systematically find ‘innovators’ in the UK technology scene. It’s been a really interesting project so far, and we’re looking forward to launching the work more widely in 2014.

The system we’re building is collecting and joining together data from multiple sources about software developers and what they work on, the idea being to spot innovative people, innovative companies, and to understand the tech innovation landscape better than we can just using official information. “Innovative” is a pretty subjective term, but we’ve been exploring ways of identifying innovative individuals and companies by analysing their apparent importance and influence in their professional networks.


Our sources are pretty varied, and aren’t usually used in official tracking. Namely, we’re finding data from:

  • Github, a popular code sharing service which many programmers use to store, track, and share their projects, both public and private
  • StackOverflow, a question and answer site for technical issues
  • Twitter, where a lot of social chat goes on, and
  • Open Corporates, the open database of the corporate world, which pulls together the official public data available on companies

We’re still working out what the final interface will look like – and would be really interested to hear any thoughts on what would be most useful – but we expect it to be a web-based way to identify explore innovators and their relationships by region, specialty, and maybe other factors as well. We hope that this will give a way to escape the ‘filter bubble’ of known innovators and start to spot those people and companies who are slightly under the official radar at present.

Here are five of the interesting things we’ve found so far:

1.    As in lots of other social networks, the number of followers of UK Github users follows the Pareto (80/20) principle, where 80% of followers are watching just 20% of the total users. This is handy for us, since there are a few central users and innovators who are genuinely influential

2.    Big network analysis is way more computationally intensive than you might guess. Because everybody can be linked to everybody else, there are lots of potential relationships to analyse: if there are only 1,000 users, there are 1,000,000 relationships, so the sums get very big very fast. In fact, there are about 15,000 UK ‘innovators’, plus their friends, who we want to look at. We’ve been working with modern open source tools including Neo4j and Gephi, which are pretty good, but we’re still stretching the limits of what’s practical.

3.    Living in our own filter bubble at a technology company in Shoreditch, it’s easy to imagine that all the innovators work at small technology companies in Shoreditch. In fact, that’s off the mark: we find lots of people working for big corporations, for organisations like the BBC, and for universities.

4.    On the other hand, our filter bubble isn’t too bad: in our random sample of innovators, we also found a few people who we knew and who Nesta knew personally, so we’re pretty sure that we are hitting the right networks of people.

5.    Innovative people seem to have side projects; we’d initially assumed that we could link people with companies just by looking at their websites. The fact is, a lot of the innovators we identified have got their own personal websites, as well as corporate identities – they’re connected to multiple projects and not just their main employer

(cross posted from

What and why is #democake?

We use an agile approach in all the work we do here at Mastodon C.

Part of our agile technology development is breaking work up into short, set-length iterations, with clear deadlines at the end of each, which helps us to clarify what we’re doing right now, to plan effectively, and also to have a rhythm of regular checkpoints where we can look up from our keyboards, review progress, and respond to anything that’s changed.

Internally, we run 1-week iterations, which end on a Tuesday afternoon. Then, every Wednesday, we take 20 minutes to prioritise what we want to do over the next week. This gives us a chance to make sure we’re still working on the right things and not forgetting to do important stuff that might be buried by the urgent, but less important things that appear daily.

At the end of the iteration we all show off what we’ve achieved, relax, and bask in a little glory before heading, refreshed and ready to tear into the next iteration. To help us with that celebration, every Tuesday afternoon we also have demo cake.

The heavy responsibility of demo cake acquisition this week fell to Merici. She did well.


The end of summer. The end of internship. The end of awesome.

Our superstar intern @AnnaPawlicka finished her time with us last Friday *sniffles*Here’s her look back on the experience, cross-posted from her blog . If we can keep hiring people like her and making them happy, we’ll know we’re doing something right. 

This has been a whirlwind of a summer, and I can’t believe it’s come to an end.  Working at Mastodon C has been exactly what I wanted it to be: an awesome experience. I can honestly say I enjoyed every single day. Whether I was working my way out of JavaScript’s scoping labyrinths or learning the power of Cascalog, there was a sense of pride for me knowing that I had directly participated as a part of the team. With a very very rare coffee requests ;-)

With the recent rise of big data and open data I feel very lucky to have been in the middle of it. During my first week of the internship I was introduced to the CDEC Open Health Data Platform idea, given a bunch of files, some examples of maps and Cascalog queries, and off I went.

It took me a couple of hours to work through Paul Lam’s “Cascalog for the Impatient” – enough to start writing my own queries. Cascalog, and Clojure for that matter, are amazing. You get to do stuff fast, clean and in a functional way. I used it to scrub the data and calculate diabetes prevalence stats.  If a recent graduate whose only experience is in Java, is able to do that in a few days, then everyone who is dismissing the power of Clojure should go and ponder over their technology choices ;-)  Feel free to have a look at the GitHub repo, all code is open source.

I have been a frequent user of online maps, but I have not created any maps myself. This was about to change. I was going to take the data processed with Cascalog  and visualise it on interactive maps and charts. Brilliant JavaScript libraries made this process much easier: leaflet and dimple. And although my JavaScript was rusty, it all soon started to come together. The (still ongoing) results can be seen on the alpha site. The feedback it received on the day of the launch was the best reward I could imagine.

Since then I have created more maps, more charts and more queries. And during that time I have always been myself: I came to code, Mastodon C allowed me to do just that. It has been like working in this unreal world where I wake up in the morning wanting to go to work, and where every day helps me to live my life in a more rewarding way. Being able to use my skills to help someone and see their positive reaction – priceless.

Thank you to all Mastodonians for guiding the way for me throughout the entire summer.

Fran (@fhr), for being a friendly, approachable boss, for introducing me to Women in Data, for inspiring me to overcome my fears of public speaking and balancing out the sci-fi references.

Bruce (@otfrom), for always having a positive attitude and constantly helping to point me in the right direction, for introducing me to the world of Clojure and for the long list of all sci-fi films/books I’m yet to watch/read.

Neale (@sw1nn), for the Iron Maiden talks, for always willing to help out, for inspiring me through your passion for technology and keeping lunch breaks interesting thanks to your amazing skill of remembering facts.

Steve (@sf105) and Merici (@merici), it’s been a pleasure to work with you both, even if only briefly. Thank you for advice on professional career and for the froyo!

It has been a wonderful experience to be part of a company that does meaningful things. If I were asked again what I wanted to be when I grow up, my answer would be: a Giant Robot Mastodonian: professional, expert and geeky. You guys rock!

Open Health Data – alpha launch happy dance

[cross posted from

We just launched the alpha of the CDEC Open Health Data Platform. It’s been an exciting process developing the platform so far, and we’re really pleased with the way the alpha site looks and the reactions it’s had. The platform in general is a really fun project: it’s motivated by the fact that there’s an enormous amount of interesting and valuable raw health data available, from the NHS and elsewhere, but that it’s relatively difficult for business and government to make use of that data without a good reusable technical base to help them build different views and services around it. 
It’s easy to see how that data holds a lot of economic and social value for so many parties. For example, public, private and third sector services providers want to understand existing need and provision in different areas, to know where to send resources; journalists and politicians want to understand what health issues are topical in their area; and of course medical suppliers want to understand prevalence of conditions, supply of resources, trends and patterns. The aim behind the platform is to help all these parties to get hold of that possible value, by making the raw data easier to use and reuse. 
The idea behind having a very early alpha, and of developing the site very visibly in public, is to get a minimal version out for discussion which will make sure it is developing towards what people need. The project itself is quite an unusual one, since CDEC aren’t the ‘true’ end users: the users are really the people who we hope will take and repurpose the platform code, and the people who’ll use the interactive front end either of our version or of the repurposed versions of the platform. This gives us a bit of a dilemma: it would be impossible, and pretty stupid, to try and build something which serves every potential need we can think of (there are a lot of different potential ways of looking at this data) so we have to choose some example cases to build functionality around. 
For the alpha launch, we looked at data relating to diabetes – prescriptions, prevalence, and other information. Diabetes is a good ‘example’ condition to hang development around, as it’s common, chronic, affected by multiple environmental, medical, and social factors, and expensive to manage – so there are strong economic incentives for lots of different players to make sense of diabetes data for lots of different purposes: both charities like Diabetes UK, health providers, and private sector providers who serve diabetes patients. 
We and CDEC are very happy with the way this first alpha version has come out, and have come up with a lot more questions based on the initial site. We’re also now looking to get feedback on what direction this should build in, and what functionality is important to users (both those who want to repurpose the code for themselves, and those who want to use the existing front end). If that’s you, drop us a line, or even better, raise an issue or fork the code on Github we positively want you to take it, use it, and make money or do other good stuff with this material.