<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Mastodon C</title>
	<atom:link href="http://blog.mastodonc.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mastodonc.com</link>
	<description>Big data done better.</description>
	<lastBuildDate>Fri, 17 May 2013 16:26:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.mastodonc.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Mastodon C</title>
		<link>http://blog.mastodonc.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.mastodonc.com/osd.xml" title="Mastodon C" />
	<atom:link rel='hub' href='http://blog.mastodonc.com/?pushpress=hub'/>
		<item>
		<title>Help! I can&#8217;t find any female speakers for my tech conference!</title>
		<link>http://blog.mastodonc.com/2013/05/17/help-i-cant-find-any-female-speakers-for-my-tech-conference/</link>
		<comments>http://blog.mastodonc.com/2013/05/17/help-i-cant-find-any-female-speakers-for-my-tech-conference/#comments</comments>
		<pubDate>Fri, 17 May 2013 15:19:03 +0000</pubDate>
		<dc:creator>otfrom</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[conferences]]></category>
		<category><![CDATA[speakers]]></category>
		<category><![CDATA[user groups]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=227</guid>
		<description><![CDATA[I occasionally get asked where one could go to find women speakers for tech events and conferences. Here is short list from the top of my head: Women in Data run by @yoditstanton The Women&#8217;s Room Rails Bridge (for Ruby) pyladies (for Python) flossie (forFree/Libre Open Source Software) JDuchess (for Java) LambdaLadies (for functional programming) MzTek (technology and [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=227&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<div>I occasionally get asked where one could go to find women speakers for tech events and conferences. Here is short list from the top of my head:</div>
<p></p>
<div></div>
<div><a href="http://www.meetup.com/Women-in-Data/">Women in Data</a> run by <a href="http://twitter.com/yoditstanton">@yoditstanton</a></div>
<div><a href="http://thewomensroom.org.uk/">The Women&#8217;s Room</a></div>
<div><a href="http://workshops.railsbridge.org/">Rails Bridge</a> (for Ruby)</div>
<div><a href="http://www.pyladies.com/">pyladies</a> (for Python)</div>
<div><a href="http://www.flossie.org/">flossie</a> (forFree/Libre Open Source Software)</div>
<div><a href="http://jduchess.org/">JDuchess</a> (for Java)</div>
<div><a href="http://www.lambdaladies.com/">LambdaLadies</a> (for functional programming)</div>
<div><a href="http://www.mztek.org/">MzTek</a> (technology and art)</div>
<div><a href="http://www.meetup.com/Ladies-Who-Code/">Ladies who Code</a></div>
<div></div>
<div></div>
<p></p>
<div>Also on twitter</div>
<p></p>
<div></div>
<div></div>
<div>
<div>@<a href="http://twitter.com/anettebgo">anettebgo</a> (javascript) UK</div>
<div>@<a href="http://twitter.com/annashipman">annashipman</a> (software craftsmanship, Java, Ruby, JavaScript, Perl, development methodology and clean code) UK</div>
<div>@<a href="http://twitter.com/fhr">fhr</a> (my boss on R, big data and DataKind) UK</div>
<div>
<div>@<a href="http://twitter.com/jennifersmithco">jennifersmithco</a> (clojure, neo4j &amp; graphs) UK</div>
<div>@<a href="http://twitter.com/jenit">jenit</a> (open data) UK</div>
</div>
<div>@<a href="http://twitter.com/lunivore">lunivore</a> (bdd) UK</div>
<div>@<a href="http://twitter.com/mia_out">mia_out</a> (Digital Humanities, web dev) UK</div>
<div>@<a href="http://twitter.com/misslake">misslake</a> (Art and programming and senior TED fellow) UK</div>
<div>@<a href="http://twitter.com/trisha_gee">trisha_gee</a> (Java, Java Performance, MongoDB) UK</div>
<div>@<a href="http://twitter.com/yoditstanton">yoditstanton</a> (big data, clojure, hadoop) UK</div>
<div></div>
</div>
<p></p>
<div>@<a href="http://twitter.com/bodil">bodil</a> (fp, clojure, clojurescript, javascript, web dev) Norway</div>
<div></div>
<p></p>
<div>@<a href="http://twitter.com/antiheroine">antiheroine</a> (web dev) US</div>
<div>@<a href="http://twitter.com/ashedryden">ashedryden</a> (ruby &amp; general diversity) US</div>
<div>@<a href="http://twitter.com/carinmeier">carinmeier</a> (clojure and fp) US</div>
<div>@<a href="http://twitter.com/kaythaney">kaythaney</a> (Mozilla Open Science, Big Data and DataKind) US (formerly UK)</div>
<div></div>
<div></div>
<p></p>
<div>And I&#8217;m sure quite a few more, but that is a start. Please add more in the comments</div>
<div></div>
<div></div>
<p></p>
<div>I&#8217;m sure @<a href="http://twitter.com/ashedryden">ashedryden</a> has a much more complete list on her blog as she is writing a book on this.</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/227/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/227/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=227&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2013/05/17/help-i-cant-find-any-female-speakers-for-my-tech-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/324c1cedf36b59d44a13e76868af4b8d?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">otfrom</media:title>
		</media:content>
	</item>
		<item>
		<title>Retrofit is go!</title>
		<link>http://blog.mastodonc.com/2013/03/05/retrofit-is-go/</link>
		<comments>http://blog.mastodonc.com/2013/03/05/retrofit-is-go/#comments</comments>
		<pubDate>Tue, 05 Mar 2013 16:43:50 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=219</guid>
		<description><![CDATA[We&#8217;ve got a nice project which went live today &#8211; we&#8217;ve been providing the final performance analysis for the £17m government-funded Retrofit for the Future programme, which has been running since 2009. The programme retrofitted 120 homes in the UK with multiple energy-saving and energy-efficiency technologies, and collected data through over 1300 sensors and meters [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=219&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve got a nice project which went live today &#8211; we&#8217;ve been providing the final performance analysis for the £17m government-funded Retrofit for the Future programme, which has been running since 2009. The programme retrofitted 120 homes in the UK with multiple energy-saving and energy-efficiency technologies, and collected data through over 1300 sensors and meters taking readings every 5 minutes. The aim of the programme was to inform the planning and development of retrofit and new-build homes to help the UK meet its energy goals. They launched the final results at Ecobuild today, and they&#8217;re also up on <a href="http://www.retrofitanalysis.org/">this website</a>.</p>
<p>As you might imagine, that many meter readings accumulate into a dataset which is pretty big, and it&#8217;s been a lot of fun trying to wrangle it into shape. The sensors have come in all shapes and sizes, and all levels of reliability, so (as always) a major part of the work has been in cleaning and making sense of the data, before actually being able to do a lot with it. The good news is that the results are great &#8211; an average reduction in emissions of over 60%, with some homes reducing by more than 80% (fivefold!). There&#8217;s also a load more we could do with the same dataset &#8211; we haven&#8217;t even begun to dig into the behavioural and other factors that are revealed by this very fine-grained information.</p>
<p>So for now, take a look at <a href="http://www.retrofitanalysis.org/">the report</a>, and you can register to gain your own access to the raw data at https://est.amee.com/. We&#8217;ll be releasing the analysis code shortly, as well. <a title="the report" href="http://www.retrofitanalysis.org/"><br />
</a></p>
<p><a href="http://mastodonc.files.wordpress.com/2013/03/retrofit-image.jpg"><img class="aligncenter size-full wp-image-220" alt="retrofit-image" src="http://mastodonc.files.wordpress.com/2013/03/retrofit-image.jpg?w=625"   /></a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/219/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=219&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2013/03/05/retrofit-is-go/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>

		<media:content url="http://mastodonc.files.wordpress.com/2013/03/retrofit-image.jpg" medium="image">
			<media:title type="html">retrofit-image</media:title>
		</media:content>
	</item>
		<item>
		<title>Infamy, infamy</title>
		<link>http://blog.mastodonc.com/2013/02/06/infamy-infamy/</link>
		<comments>http://blog.mastodonc.com/2013/02/06/infamy-infamy/#comments</comments>
		<pubDate>Wed, 06 Feb 2013 12:16:39 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=215</guid>
		<description><![CDATA[More recent and upcoming talks from Team Mastodon: A couple of recent ones that are up online: Using data for EVIL! at the Strata Data Warfare conference Big Data comes to the NHS at ODI Fridays lunchtime lecture More public talks coming soon: Open Data for CSR, Intelligent Futures, 12th Feb R and Shiny at [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=215&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>More recent and upcoming talks from Team Mastodon:</p>
<p><strong>A couple of recent ones that are up online:</strong></p>
<p>Using data for EVIL! at the <a href="http://oreillynet.com/pub/e/2557">Strata Data Warfare conference</a></p>
<p>Big Data comes to the NHS at <a href="http://www.scribd.com/doc/122981341/Friday-Lunchtime-Lectures-at-The-ODI-Big-Data-Comes-to-the-NHS">ODI Fridays lunchtime lecture</a></p>
<p><strong>More public talks coming soon:</strong></p>
<p><a href="http://ifutures.eventbrite.co.uk/">Open Data for CSR, Intelligent Futures, 12th Feb</a></p>
<p><a href="http://datasciencelondon.org/">R and Shiny at Data Science London, 21st Feb</a></p>
<p><a href="http://www.londonr.org/">R and NHS data, LondonR, 19th March</a></p>
<p><a href="http://www.devoxx.com/display/DV12/Home">Devoxx, 26th-27th March</a></p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/215/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/215/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=215&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2013/02/06/infamy-infamy/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>What happens when a project gets into the press</title>
		<link>http://blog.mastodonc.com/2013/01/04/what-happens-when-a-project-gets-into-the-press/</link>
		<comments>http://blog.mastodonc.com/2013/01/04/what-happens-when-a-project-gets-into-the-press/#comments</comments>
		<pubDate>Fri, 04 Jan 2013 15:52:45 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=210</guid>
		<description><![CDATA[Prescribing Analytics got an awful lot of positive press coverage over the last few weeks. It&#8217;s very exciting to see something we&#8217;ve made be covered so extensively &#8211; it was on the front page of the Independent, on the PM programme on Radio 4 (47 mins in), the Economist, the FT, and even in the Revolution [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=210&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Prescribing Analytics got an awful lot of positive press coverage over the last few weeks.</p>
<p>It&#8217;s very exciting to see something we&#8217;ve made be covered so extensively &#8211; it was on the front page of <a href="http://www.independent.co.uk/news/uk/home-news/prescribe-cheaper-drugs-gps-told-8433635.html">the Independent</a>, on the<a href="http://t.co/zAcygMMD"> PM programme </a>on Radio 4 (47 mins in), the <a href="http://www.economist.com/news/britain/21567980-how-scrutiny-freely-available-data-might-save-nhs-money-beggar-thy-neighbour">Economist</a>, the <a href="http://blogs.ft.com/ftdata/2012/12/06/un-needed-branded-drugs-cost-nhs-millions/">FT</a>, and even in the <a href="http://blog.revolutionanalytics.com/2012/12/nhs-prescription-analytics.html">Revolution Analytics R blog</a> as well as various medical and pharma blogs.</p>
<p>It&#8217;s been interesting the different ways this piece of work has been interpreted by different journalists. In the text on the site, and when being interviewed, we were very careful to emphasise the context of the data and that NHS prescribing advisers already do a lot of work on this issue. Most of the responses we had seemed to understand this very well, and it&#8217;s been great to see how many NHS employees have been in touch both to say that they enjoy it and to suggest future developments that would be helpful. We did get a few sensational headlines, but most of the coverage was brilliantly balanced, with Radio 4 and the Economist being especially impressive in the care they took to ensure they portrayed the issue fairly.</p>
<p>Anyway, the amount of value that this small project has been able to create makes it pretty clear that there&#8217;s a lot more to be done with public health data. It also makes me realise that people loooove maps as a data visualisation. Lesson for every project there, fit in a map if you can &#8211; it&#8217;s fun and interesting to play with, and people pay much more attention to the resulting data as a result.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/210/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/210/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=210&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2013/01/04/what-happens-when-a-project-gets-into-the-press/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>It lives!</title>
		<link>http://blog.mastodonc.com/2012/12/12/it-lives/</link>
		<comments>http://blog.mastodonc.com/2012/12/12/it-lives/#comments</comments>
		<pubDate>Wed, 12 Dec 2012 15:56:38 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=206</guid>
		<description><![CDATA[We&#8217;ve finally published the prescribing analytics portal here It covers prescribing variation of statins in the UK last year, making it really easy to explore variation which could be costing extra money. I definitely recommend taking a look &#8211; it&#8217;s fascinating to explore. It got some great coverage too &#8211; the Financial Times, the Economist, [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=206&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>We&#8217;ve finally published the prescribing analytics portal <a href="http://www.prescribinganalytics.com/">here</a></p>
<p>It covers prescribing variation of statins in the UK last year, making it really easy to explore variation which could be costing extra money. I definitely recommend taking a look &#8211; it&#8217;s fascinating to explore.</p>
<p>It got some great coverage too &#8211; the Financial Times, the Economist, and Huffington Post all had approving mentions.</p>
<p>Last week, the NHS Information Centre released another <a href="http://www.ic.nhs.uk/prescribing">quarter of a billion rows of data</a>, so there&#8217;s a lot more to be done here, and this kind of big data is exactly our bag. Watch this space!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/206/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/206/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=206&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/12/12/it-lives/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>Big data, simple analytics, big impact.</title>
		<link>http://blog.mastodonc.com/2012/10/29/big-data-simple-analytics-big-impact/</link>
		<comments>http://blog.mastodonc.com/2012/10/29/big-data-simple-analytics-big-impact/#comments</comments>
		<pubDate>Mon, 29 Oct 2012 12:55:33 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[nhs]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=197</guid>
		<description><![CDATA[Don&#8217;t tell anyone yet, but we&#8217;ve got a big project going live soon which aims to help the NHS save an awful lot of money. It turns out that by altering GPs&#8217; prescribing behaviour for a few drugs, swapping generic for proprietary forms where appropriate, it&#8217;s possible to save hundreds of millions of pounds a [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=197&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Don&#8217;t tell anyone yet, but we&#8217;ve got a big project going live soon which aims to help the NHS save an awful lot of money. It turns out that by altering GPs&#8217; prescribing behaviour for a few drugs, swapping generic for proprietary forms where appropriate, it&#8217;s possible to save hundreds of millions of pounds a year &#8211; working with the doctors at <a href="http://www.openhealthcare.org.uk/">Open Healthcare UK</a>, we&#8217;ve clarified how prescriptions can and should safely be changed, done the detailed financial analysis, and created maps and rankings of exactly which GPs are spending what.</p>
<p>The full details will all be published in due course, but actually doing the project has reminded me of a few basic principles that come up again and again:</p>
<p>1. <strong>Good domain knowledge usually beats super-smart algorithms</strong>. Working with qualified doctors and using their understanding of the ins and outs of prescribing behaviour was massively more important to this project than the mathematical and analytical aspects.</p>
<p>2. <strong>The effort of getting data doesn&#8217;t necessarily correlate with its importance</strong>. This whole project was based on open data from the NHS Information Centre, free for anyone to download &#8211; acquisition was easy, it&#8217;s actually doing something with it that was the harder part.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/197/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/197/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=197&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/10/29/big-data-simple-analytics-big-impact/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>Big Data, told through the medium of The Muppets</title>
		<link>http://blog.mastodonc.com/2012/10/22/big-data-told-through-the-medium-of-the-muppets/</link>
		<comments>http://blog.mastodonc.com/2012/10/22/big-data-told-through-the-medium-of-the-muppets/#comments</comments>
		<pubDate>Mon, 22 Oct 2012 12:21:24 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[muppets]]></category>
		<category><![CDATA[startup life]]></category>

		<guid isPermaLink="false">http://blog.mastodonc.com/?p=193</guid>
		<description><![CDATA[From the Bethnal Green Ventures Demo Day &#8211; my 5 minute version of what we do, told mostly through the medium of Muppets. Not the most technically deep talk I&#8217;ve ever done, but I&#8217;m actually pretty pleased with this as a very quick teaser on what we do and what it&#8217;s for. Now to work on the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=193&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>From the Bethnal Green Ventures Demo Day &#8211; m<a href="http://vimeo.com/51602539">y 5 minute version of what we do</a>, told mostly through the medium of Muppets.</p>
<div>Not the most <em>technically </em>deep talk I&#8217;ve ever done, but I&#8217;m actually pretty pleased with this as a very quick teaser on what we do and what it&#8217;s for.</div>
<div></div>
<div>Now to work on the song and dance number.</div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/193/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/193/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=193&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/10/22/big-data-told-through-the-medium-of-the-muppets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>When, will I, will I be famous?</title>
		<link>http://blog.mastodonc.com/2012/09/20/when-will-i-will-i-be-famous/</link>
		<comments>http://blog.mastodonc.com/2012/09/20/when-will-i-will-i-be-famous/#comments</comments>
		<pubDate>Thu, 20 Sep 2012 09:59:35 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[fame]]></category>
		<category><![CDATA[talks]]></category>

		<guid isPermaLink="false">http://mastodonc.wordpress.com/2012/09/20/when-will-i-will-i-be-famous/</guid>
		<description><![CDATA[(if you now have an earworm, congratulations, we are of the same generation) In October you&#8217;ll have a few chances to see the very rare and shy Team Mastodon in public.  On October 2nd, Fran will be talking at O&#8217;Reilly Strata London on How to Make Big Data Massively Greener (contact us for discounted tickets for [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=192&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>(if you now have an earworm, congratulations, we are of the same generation)</p>
<p>In October you&#8217;ll have a few chances to see the very rare and shy Team Mastodon in public. </p>
<p>On October 2nd, Fran will be talking at O&#8217;Reilly Strata London on <a href="http://strataconf.com/strataeu/public/schedule/detail/25849">How to Make Big Data Massively Greener</a> (contact us for discounted tickets for the whole event)</p>
<p>On October 4th, we&#8217;ll be showing off all the awesome stuff Mastodon C do at the Bethnal Green Ventures Demo Day at Google Campus (free tickets, contact us for an invitation).</p>
<p>On October 16th, Bruce will be talking at Big Data Con London (unfortunate name there) on <a href="http://bigdatacon.co.uk/2012/">Big Data in the Cloud with Clojure</a></p>
<p>See you there!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/192/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=192&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/09/20/when-will-i-will-i-be-famous/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>Hadoop vs Data Warehouses</title>
		<link>http://blog.mastodonc.com/2012/08/20/hadoop-vs-data-warehouses/</link>
		<comments>http://blog.mastodonc.com/2012/08/20/hadoop-vs-data-warehouses/#comments</comments>
		<pubDate>Mon, 20 Aug 2012 10:20:05 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://mastodonc.wordpress.com/?p=187</guid>
		<description><![CDATA[A friend asked me this week what the difference is between using Hadoop and its related ecosystem for data storage and analysis, and using a traditional Data Warehouse. You might want to skip this post if you&#8217;re already way ahead on this topic, but for everyone else, I thought I&#8217;d try and clarify&#8230; A Data [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=187&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A friend asked me this week what the difference is between using Hadoop and its related ecosystem for data storage and analysis, and using a traditional Data Warehouse.</p>
<p>You might want to skip this post if you&#8217;re already way ahead on this topic, but for everyone else, I thought I&#8217;d try and clarify&#8230;</p>
<p><span id="more-187"></span></p>
<p>A <strong>Data Warehouse</strong> is a structured relational database, where you aim to collect together all the interesting data from multiple systems. When putting data into a warehouse, you need to clean and structure it at the point of insertion. This cleaning and structuring process is usually called ETL &#8211; Extract, Transform, and Load. The data warehouse approach is helpful because then your data looks clean and simple, but it is also very limiting: you&#8217;re now unable to change the questions you want to ask later since the data&#8217;s already been pre-processed, or to correct for errors in the ETL process. It can also get very expensive, as enterprise data warehouses are typically built on specialised infrastructure which becomes very pricey for large datasets. There&#8217;s a nice diagram of a full data warehouse structure on <a href="http://en.wikipedia.org/wiki/Data_warehouse">the Wikipedia article here</a>.</p>
<p>The <strong>Hadoop</strong> ecosystem (could also be called the &#8216;Big Data&#8217; approach) starts from the same aim of wanting to collect together as much interesting data as possible from different systems, but approaches it in a radically better way. With this approach, you dump all data of interest into a big data store (usually HDFS &#8211; Hadoop Distributed File System). This is often in cloud storage &#8211; cloud storage is good for the task, because it&#8217;s cheap and flexible, and because it puts the data close to cheap cloud computing power. You can still then do ETL and create a data warehouse using tools like Hive if you want, but more importantly you also still have all of the raw data available so you can also define new questions and do complex analyses over all of the raw historical data if you wish. The Hadoop toolset allows great flexibility and power of analysis, since it does big computation by splitting a task over large numbers of cheap commodity machines, letting you perform much more powerful, speculative, and rapid analyses than is possible in a traditional warehouse.</p>
<p>One of the most confusing things in this field is naming &#8211; one man&#8217;s data warehouse is another man&#8217;s data store &#8211; so I&#8217;m sure I&#8217;ll get some corrections from people who use different terminology from that above. However, I hope this does untangle things at least a little bit.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/187/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/187/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=187&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/08/20/hadoop-vs-data-warehouses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>
	</item>
		<item>
		<title>Hive pain reduction tricks</title>
		<link>http://blog.mastodonc.com/2012/08/05/hive-pain-reduction-tricks/</link>
		<comments>http://blog.mastodonc.com/2012/08/05/hive-pain-reduction-tricks/#comments</comments>
		<pubDate>Sun, 05 Aug 2012 14:53:00 +0000</pubDate>
		<dc:creator>Fran</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[data science]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>

		<guid isPermaLink="false">http://mastodonc.wordpress.com/?p=175</guid>
		<description><![CDATA[Hive is a SQL-like interface onto Map Reduce. It feels nice and familiar to analysts who are used to thinking in a SQL paradigm, but it has some nasty gotchas that can make jobs verrrrrry slow or make them fail altogether. Either way, you waste a lot of time, blood pressure, and machine hours. I [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=175&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Hive is a SQL-like interface onto Map Reduce. It feels nice and familiar to analysts who are used to thinking in a SQL paradigm, but it has some nasty gotchas that can make jobs verrrrrry slow or make them fail altogether. Either way, you waste a lot of time, blood pressure, and machine hours.</p>
<p>I went to a great talk recently by Philip Tromans at the London Hive meetup which covered some very useful Hive Optimisation tips. His full deck is <a href="https://speakerdeck.com/u/philiptromans/p/hive-optimisation-tips-tricks">here</a>, but I&#8217;ve shamelessly recopied a couple of the most useful points here:<img title="More..." src="http://wordpress.com/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" alt="" /></p>
<p><span id="more-175"></span></p>
<ol>
<li><strong>Avoid NULLs in join fields.</strong> The field that you&#8217;re joining on gets hashed and then the hash mod (some value) determines which reducers the job goes to. NULL values all get sent to the same reducer, which is usually pointless and is time consuming if there are a lot of NULL values. Push a non-null condition into the join to make things go much faster, i.e. instead of &#8220;A join B on (A.id=B.id)&#8221;, do &#8220;A join B on (A.id=B.id and B.id IS NOT NULL)&#8221;</li>
<li><strong>Use map-side joins and order join tables well. </strong>There is a default which is set to &#8216;off&#8217; which selects the optimal join. If you precede your query with &#8216;SET hive.auto.convert.join=TRUE;&#8217; then it will do a map-side join where appropriate, which can make things much faster. Also, joins happen one at a time, left to right, and joining to a big table is expensive, so use sub-joins to minimise size of join.</li>
<li><strong>There are a bunch of other useful defaults</strong> in conf/hive-default.xml.template. Taking a look at these, and setting the ones you like to TRUE, will save an awful lot of unnecessary woe.</li>
</ol>
<p>I&#8217;d recommend reading his whole deck to get some more insight.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/mastodonc.wordpress.com/175/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/mastodonc.wordpress.com/175/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.mastodonc.com&#038;blog=32231674&#038;post=175&#038;subd=mastodonc&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.mastodonc.com/2012/08/05/hive-pain-reduction-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/4b4be630052d6124e8b884e60e8737b7?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">magicdashes</media:title>
		</media:content>

		<media:content url="http://wordpress.com/wp-includes/js/tinymce/plugins/wordpress/img/trans.gif" medium="image">
			<media:title type="html">More...</media:title>
		</media:content>
	</item>
	</channel>
</rss>
