“Data-Driven Healthcare”: Pt II

Data-Driven Healthcare: Introduction

I wrote up some initial thoughts on Healthcare ‘Pt I’ a few weeks ago. This ‘Pt II’ is a follow up looking in more detail at some approaches and algorithms being applied to examine what we might mean by ‘Data Driven Healthcare’.

Its worth re-iterating or restating some of the fundamentals here:

  1. Healthcare is a massive field, and Data related to this is increasing by the second. This must needs be an extremely selective overview
  2. Healthcare on the inside (B2B) – systems, workflows and business processes associated – are not yet ‘Data Driven’, and neither is the outside (B2C) for an overall ‘customer’ or patient experience
  3. A lot of the focus for data-driven healthcare is on optimising outcomes and reducing risk by identifying, diagnosing or intervening earlier and more accurately to provide better outcomes. This also helps acts to reduce costs – financial and organisational, to healthcare providers or insurers, as well as personal to the patient themselves
  4. Commercially, Data-Driven Healthcare is seen as a ‘promised land’ for those interested in disruption, and a Breughel-esque landscape of terror for those who are not. For the USA alone, trillions of $$$ are at stake. See McKinsey on market size here and on disruption here, and PWC on data-driven decision making in health here. In the USA the specific changes in the structure of the healthcare insurance market represented by the ACA have added a further ‘twist’. Here is a sample exhibit from one of the PWC papers on data and analytics in executive decision making:

PWCHealthcareDataDriven2014 v3

One thing that most observers agree on is that patience, tenacity, and targeting are going to be required. There cannot be a ‘killer app’ or a ‘single behemoth’ that floats to the top. Here’s a representative quote from Venture Beat in May 2015:

“There may be no killer apps in digital health. There may be no Uber for health care. The good investments in the space are startups that are willing to go neck-deep in the complexities of the business, the regulations, the clinical aspects, the old school workflows, and other minutia of the health care business. They fill a niche, erase a friction point, and then hang in for the long haul. Jack Young, a partner at dRx Capital, nails it with this quote: “It’s not about disruption, it’s about implementation.”

So, breaking this down in to a narrative structure is also a ‘work in progress’. Some of the key themes include:

Instrumentation – or the ‘generation’ of data. This is split between ‘business’ or B2B users – clinical or hospital based data instrumentation and ‘consumer’ users – ranging from web apps, sensors, wearables to user-generated content on the web.

Relationships – this is the ‘big Graph’ that healthcare represents – relationships, over time, between any of the actors or participants in the process. This also includes some notes on organisational ‘culture’, including policy and legislation landscapes for the USA and UK. For USA see the FDA website on Science and Research guidelines here and on compliance.

Applications – not strictly in a software sense, but more towards clear ‘requirement- or needs-driven’ cases that are leading the way. This ranges from Genomic sampling to Diagnosis.

Players – again, its often easiest to try and make sense of what is going on by looking at who is doing what, from the ‘giants’ such as Google, Microsoft or IBM to the younger, more narrowly-targeted start-ups around the Globe, and including some of the most influential individuals involved as well for the ‘human’ side.

Instrumentation from the Inside

Much of the focus currently is on future data from patients themselves that is yet to be instrumented, primarily through ‘wearables’, either as devices or applications, in some form or another. Before we get there, though, there is the question of the data that is already ‘instrumented’ in many senses, and is waiting for analysis. The trouble is, its hidden away, in silos, in systems, in locker-rooms and across multiple locations. IBM Watson Health have a few years of foothold here:


Wearables: Instrumentation from the Outside

Wearables are now mainstream. These vary from the ‘multi-functional’ or decorative that provide a home for selected health-related applications or data such as the Apple Watch, to ‘dedicated’ health specific devices…


The data streams that will be instrumented from this and similar platforms make me think also of the Hugh Laurie ‘House’ character’s predisposition to mistrust anything a patient says – the ‘Patients are liars’ approach to medical history and diagnosis.

Already we’ve seen partnerships evolve between parties, both large and small, aligning on who can provide access to the data stream (the wearable / application providers) and the data consumers. Note also that in the USA the FTC has already laid down the law in 2015 on apps claiming to diagnose melanoma based on smartphone pictures.

User Generated Content

One area for Public Health set apart uses ‘trends’ inferred from unstructured ‘user-generated content’ in the form of search activity or social media ‘Early warning’ detection and / or impact or prevalence estimation for epidemics or communicable or infections disease.

The search giants – Google and Microsoft – have led the way here for obvious reasons. Examples include Google Flu Trends in production since 2008 (see the ‘comic book’ on how this then became Google Correlate and then Google Trends here), recently expanded also to include Dengue Trends, and then also Microsoft Research studies on Ebola and on the effectiveness of a health intervention campaign on Influenza (the ‘flu’) in the UK here.


The Google Flu Trends work has been ongoing for a number of years, and, like many ‘early entrants’, suffered some unwanted and perhaps unfair adverse publicity due to significant over-reporting of ICI in 2013.  Using the ‘hype-cycle’ as a guide, this had firmly slipped down to the trough of disillusionment. Overall, ‘real’ organisations have responded favourably to try and help improve the approach, including to incorporate more time-series or ‘real-time’ data from public reporting systems, and GFT underwent a major ‘upgrade‘ in 2014 to include the ‘real’ data published by the CDC for the USA.

Microsoft have also branched out to another form of ‘Early warning’ with their ‘Project Premonition’, combining a Mosquito-trapping, drone-based ‘AWACs’ system for capturing and genetically sampling mosquito populations, identifying pathogenic or malarial mosquitos and modelling the data in the cloud here.

(An interesting ‘on the ground’ comparison is Metabiota, offering ‘data-driven’ epidemiological and disease detection services from the bottom up. The Data Collective VC website has a set of other, interesting ‘Health’ or ‘BioEngineering’ startups as well).

Drug and Genome Research

Where to start? I’m not going to rehash or repeat what would be better studied elsewhere – the USA National Human Genome Research Institute or the UK Sanger Institute at Cambridge.

One ‘novel’ approach to the instrumentation of large data sets for study has been the Icelandic Genome Project, now DeCode, aquired by Amgen in 2013, and officially concentrating on the identification of key genetic risk factors for disease.

Large-scale drug discovery spanning pharmaceuticals and bio-engineering is a universe in itself. The time-to-market and R&D costs associated with Drug Discovery are so vast that the companies involved are by definition huge, but in that they support a large ecosystem of smaller operations.

The ‘Medical Graph’ and Entity Relationships

The ‘Medical Graph’ is broad, deep and complex. There are many different graphs, and graphs-within-graphs….

  1. Inter-relatedness of medical entities such as treatments, devices, drugs, procedures. See this paper Building the graph of medicine from millions of clinical narratives” from Nature in 2014 analysing co-occurrence in 20 million EHR records for 19 years from a California hospital. Here is a nice graphic on the workflow involved:Nature2014MedicalGraph
  2. ‘Big Data’ and ‘Graph Analytics’ on a Medical dataset running on dedicated appliances. For one example, see this presentation from 2012 on converting Mayo Clinic data from SQL to SPARQL/RDF for querying
  3. Three years is a long time. The software and math as well as the ‘High Performance Computing’ required to scale for this has advanced massively. There is the hugely impressive work of Jeremy Kepner of MIT on D4M, for example here on Bio-Sequencing cross-correlation, as well as the platform providers from Cray to MemSQL…

Players, People, Partnerships

This is very much a work in progress. I’m going to do this as simple list for now, and attempt to put this together a version of a market overview picture at the end.

Major systems vendors healthcare divisions and large Pharma / Drug Co’s:

GE Healthcare

Siemens Healthcare

Philips Healthcare

Novartis for modelling and simulation in Drug Development

Roche research

Technology, Applications:

IBM Watson Health. See also relationship to Apple ‘application’ data streams reported in April 2015. IBM is a massive operation, and contains possibly ‘conflicting’ solutions – see this post from 2015 here that is basically selling hardware again.

SAS as an enterprise technology vendor with a ‘health’ vertical. See articles on sponsored AllAnalytics.com and examples on corporate site here.

Crowdmed launched in 2013 as a platform for ‘crowd-sourcing’ disease diagnosis.

Counsyl and 23andme for self-administered DNA sampling and screening, see also the FDA concern expressed reported in Forbes 2013 here and here.

Google by their very nature are a major player, operating across a wide range of research and application areas or domains. Providing additional algorithm design and research see ML for Drug Discovery in 2015, and also the infrastructure to support an ‘open’ basis for research by others – see their  Cloud Platform for Genomics Research in 2014.

Anything to do with Google X and ‘futurist’ approaches is going to make headlines. See the partnership betwen Novartis and Google X for Glucose-measuring contact lenses publicised in 2014, and the Google wristband as a data-instrumenting wearable for health data. Google Ventures is also active in the field, see their involvement in Calicolabs research in to ageing.

Apple’s ResearchKit for medical and health data applications

Brendan Frey and his team at the University of Toronto (Hinton’s legacy again) for building an algorithm to analyse DNA inputs and gene splicing outputs.

Vijay Pande and the Pande Labs at Stanford for Markov State Modelling for biophysics and biophysical chemistry.

Sage BioNetworks and Dr Stephen Friend for research in linkages between genetics and drug discovery.

Medidata as a ‘clinical cloud’ for data-driven research trials.

Microsoft Research and Health, including Dr Eric Horvitz, quoted here in Wired in 2014: “Electronic health records [are] like large quarries where there’s lots of gold, and we’re just beginning to mine them”. Here is a TV appearance on Data and Medical Research from 2013.

Its always nice to appear to be ‘cutting edge’. The Microsoft Research Faculty summit 2015 held yesterday and today (9th July 2015) includes a number of sessions that, if not directly mentioning healthcare, would be fascinating in terms of their general applicability – ‘Integrative AI’, ‘Semantics and Knowledge Bases’, ‘Programming Models for Probabilistic Reasoning’ and more. Would be lovely to be there, catch the stream later.

NCI and ABCC (Advanced Biomedical Computing Centre) under Jack Collins in Maryland.

The HPC world of pMATLAB and D4M of Jeremy Kepner and John Gilbert.

LifeImage for medical image sharing.

Phew. I’m going to conclude now with a brief note on some of the partnerships or alliances, whether strategic or acquisition-related, already emerging.

The partnership or alliances that stick out for me right now are those that help span the B2B and B2C divide – where a business or enterprise focused outfit with existing traction and experience in the healthcare market partners with a consumer focused outfit which has reach, technology and infrastructure… kind of like ‘Business’ server – ‘Consumer’ client architecture.

Due to reporting visibility and transparency these are kind of ‘obvious’ examples:

  • IBM Watson Health – Apple for app data sharing and processing. There seems to be a synergy here, but I’m not yet clear on exactly how this will be delivered?
  • Novartis – Google X for the Contact Lens Glucose / Diabetes monitoring. This is a more ‘research’ focussed partnership – where the drug and manufacturing skills of one are complemented by the data and consumer knowledge of the other.
  • Broad Institute – Google Genomics, more hybrid in that its a marriage of Google computational power and analytics infrastructure with deeper genomic analysis tools from Massachusetts.

Underneath the radar are more industry-specific partnerships between organisations with an interest in ‘benevolent disruption’ in terms of improving efficiencies and outcomes – insurers and healthcare providers – and application or solution providers that can help them do this, bit by bit if necessary.


Machine Intelligence at Speed: Some Technical or Platform Notes


This post looks at some of the underlying technologies, tools, platforms and architectures that are now enabling ‘Machine Intelligence at Speed’. Speed as a concept is closely related to both Scale and Scalability. For my convenience and to try and organise things, by this I mean applications that

  1. Are built on or involve ‘Big Data’ architecture, tools and technologies
  2. Utilise a stream or event processing design pattern for real-time ‘complex’ event processing
  3. Involve an ‘In-Memory Computing’ component to be quick and also to help scale predictably at speed
  4. Also support or embed ‘Machine Learning’ or ‘Machine Intelligence’ to help detect or infer patterns in ‘real time’

People in the Bay Area reading the above might well read the above and shout ‘AMPLabs! Spark!’, which is pretty much where I’ll finish!

Hype Cycle and the ‘New Big Thing(s)’

Here is the familiar Gartner Tech Hype Cycle curve for 2014. In it you can see ‘Big Data’, ‘Complex Event Processing’ and ‘In-Memory DBMS’ chugging their sad way down the ‘Trough of Disillusionment’, whilst ‘NLP’ is still merrily peaking. ‘Deep Learning’ in terms of Deep Neural Nets doesn’t seem to my eye to have made it in time for last year.


Its a minor and unjustifiable quibble at Gartner who have to cover an awful lot of ground in one place, but the semantic equivalence of many of the ‘tech’ terms here is questionable, and the shape and inflexion points of the curves in the cycle as well as the time to reach plateau may differ.

What is important that this demonstrates is that the cyclicity this represents is well founded in new company and new technology ‘journeys’, and often how these companies are funded, traded and acquired by VCs and by each other. What I’m also interested in here is how a number of these ‘separate’ technology entities or areas combine and are relevant to Machine Intelligence or Learning at Speed.

Big Data Architectural Models

(Important proviso – I am not another ‘self professed next ****ing Google architect’, or even a ‘real’ technologist. See the ‘in/famous’ YouTube scat ‘MongoDB is webscale‘ from Garret Smith in 2010, approx 3 mins in for a warning on this. I almost fell of my chair laughing etc etc. I work in a company where we do a lot of SQL and not much else. I also don’t code. But I’m entitled to my opinion, and I’ll try to back it up!)

Proviso aside, I quite enjoy ‘architecture’, as an observer mainly, trying to see how and why different design approaches evolve, which ones work better than others, and how everything in the pot works with everything else.

Here are two brief examples – MapR’s ‘Zeta‘ architecture and Nathan Marz’s ‘Lambda‘ architecture. I’ll start with Marz as its deceptively ‘simple’ in its approach with 3 layers – speed, batch and serving. Marz worked on the initial BackType / Twitter engine and ‘wrote the book’ for Manning so I’m inclined to treat him as an ‘expert’.


Marz’s book goes in to much more detail obviously, but the simplicity that the diagram above pervades his approach. MapR’s ‘Zeta’ architecture applied to Google is here:MapRZetaGoogleExample

I know next to nothing about what actually Google does on the inside, but I’ll trust that Jim Scott from MapR does or he wouldn’t put this out to public, would he?

What this is telling me is that the ‘redesign’ of Enterprise Architecture by the web giants and what is now the ‘Big Data’ ecosystem is here to stay, and is being ‘democratised’ via the IaaS / PaaS providers, including Google themselves, via Cloud access available anyone, at a price per instance or unit per second, hour, day or month.

There are then the ‘new’ companies like MapR that will deliver this new architecture to the Enterprise who may not want to go to the Cloud for legal or strategic reasons. Set against this are the ‘traditional’ technology Enterprise vendors – Oracle, IBM, SAS, which I’ll return to elsewhere for reasons of brevity as well as knowledge on my behalf.

Big Data has evolved rapidly from something that 5 yrs ago was the exclusive preserve of Web Giants to a set of tools that any company or enterprise can utilise now. Rather than a BYO, ‘Big Data’ tool-kits and solutions are available on a service or rental model from a variety of vendors in the Infrastructure or Platform as-a-Service space, from ‘specialists’ such as Hortonworks, MapR or Cloudera, to the ‘generic’ IaaS cloud platforms such as AWS, Azure or Google.

As well as this democratisation, one of the chief change in character has also been from ‘batch’ to ‘non-batch’ in terms of architecture, latency and the applications this can then solve or support. ‘Big Data’ must also be ‘Fast Data’ now, which lead straight in to Stream or Event processing frameworks.

Stream Processing

Other developments focus on making this faster, primarily on Spark and related stream or event processing. Even as a non-developer, I particularly like the Manning.com books series, for instance Nathan Marz’s ‘Big Data‘, Andrew Psaltis’s ‘Streaming Data‘, and Marko Bonaci’s ‘Spark in Action‘ books, and also appreciate talking with Rene Houkstra at Tibco regarding their own StreamBase CEP product. .

In technical terms this is well illustrated in the evolution from a batch data store and analytics process based on Hadoop HDFS / MapReduce / Hive towards stream or event or stream processing based on more ‘molecular’ and ‘real-time’ architectures using frameworks and tools such as Spark / Storm / Kafka / MemSQL / Redis and so on. The Web PaaS giants have developed their own ‘flavours’ as part of their own bigger Cloud services based on internal tools or products, for example Amazon Kinesis and Google Cloud Dataflow.

As in many ‘big things’ there is an important evolution to bear in mind and how different vendors and tools fit in to this. For example, at Sports Alliance we’ve just partnered with Tibco for their ‘entry’ SOA / ESB product BusinessWorks. I’ve discussed the Event Processing product with Tibco but only for later reference or future layering on top. This product does has a evolution inside Tibco of over a decade – ‘Event’ or ‘Stream’ processing was not necessarily invented in 2010 by Yahoo! or Google, and the enterprise software giants have been working in this area for a decade or more, driven primarily by industrial operations and financial services. Tibco use a set of terms including ‘Complex Event Processing’ or ‘Business Optimization’, which work on the basis of an underlying event stream sourced from disparate SOA systems via the ESB, an In-Memory ‘Rules Engine’, where state-machine or the ‘whatif’ rules for pattern recognition are or may be Analyst-defined (an important exception to the ‘Machine Learning’ paradigm below) and applied within the ‘Event Cloud’ via a correlation or relationship engine.

The example below is for an ‘Airline Disruption Management’ system, applying Analyst-defined rules over a 20,000 events per second ‘cloud’ populated by the underlying SOA systems. Whether its a human-identified pattern or not, I’m still reassured that the Enterprise Software market can do this sort of thing in real-time, in the ‘real world’.


The enterprise market for this is summarised as ‘perishable insights’ and is well evaluated by Mike Gualtieri at Forrester – see his “The Forrester Wave™: Big Data Streaming Analytics Platforms, Q3 2014“. Apart from the Enterprise software vendors such as IBM, I’ll link very briefly to DataTorrent as an example of a hybrid batch / tuple model, with Google’s MillWheel also apparently something similar(?).

In-Memory Computing

Supporting this scale at speed also means In-Memory Computing. I don’t personally know a lot about this, so this is the briefest of brief mentions. See for example the list of contributors at the In-Memory Computing Summing in SF in June this year here. Reading through the ‘case studies’ of the vendors is enough to show the ‘real world’ applications that work in this way. It also touches on some of the wider debates such as ‘scale-up’ v ‘scale-out’, and what larger hardware or infrastructure companies such as Intel and Pivotal are doing.

Machine Learning at Speed: Berkeley BDAS and Spark!

So we’re back to where we started. One of the main issue with ‘Machine Learning’ at either scale or speed in many guises is scalability of algorithms and non-linearity of performance, particularly over clustered or distributed systems. I’ve worked alongside statisticians working in R on a laptop and we’ve had to follow rules to sample, limit, condense and compress in order not to overload or time out.

In the Enterprise world one answer to this has been to ‘reverse engineer’ and productise accordingly, with the investment required to keep this proprietary and closely aligned with complentary products in your porfolio. I’m thinking mainly of Tibco and their Spotfire / TERR products, which I understand to be ‘Enterprise-speed’ R.

Another approach is to compare the evolution within the Apache ecosystem of competing solutions. Mahout initially was known to be ‘slow’ to scale, see for instance an earlier post in 2012 by Ted Dunning on the potential for scaling a knn clustering algorithm inside the MapR Mahout implementation. Scrolling forward a few years to now, this is now looks to be similar to competitive territory between separate branded vendors ‘pushing’ their version of speed at scale. I couldn’t help noticing this as a Spark MLlib v Mahout bout in a talk from Xiangru Meng of Databricks (Spark as a Service) showing not only the improvements in their MLlib 1.3 over 1.2 (yellow line v red line) but ‘poor old Mahout’ top left in blue making a bad job of scaling at all for a ‘benchmark’ of an ALS algorithm on Amazon Reviews:


So one valid answer to ‘So how do I actually do Machine Intelligence at Speed’ seems to be ‘Spark!’, and Databricks has cornered the SaaS market for this.

The Databricks performance metrics quoted are impressive, even to a novice such as myself. The ecosystem in evolution, from technologies, APIs to partners and solutions providers, looks great from a distance. There are APIs, and pipeline and workflow tools and a whole set more.

Databricks is a child of AMPLabs in Berkeley. The Berkeley Data Analytics Stack BDAS provides us with another (3rd) version of ‘architecture’ for both Big Data and Machine Learning at Speed.


BDAS already has a set of ‘In-house Apps’ or projects working, which is a good sign or at least a direction towards ‘application’. One example is the Cancer Genomics Application ADAM,  providing an API and CLI for manipulation of genomic data, running underneath on Parquet and Spark.

Velox, one of the most recent initiatives, is for model management and serving within the stack. It proposes to help deliver ‘real-time’ or low-latency model interaction with the data stream that it is ingesting, a form ‘self-learning’ in the form of iterative model lifecycle management and adaptive feedback. Until recently, large-scale ‘Web giants’ had developed their own approaches to manage this area.

AMPLabs Velox Example 1

This is particularly exciting, as it provides a framework for testing, validation and ongoing lifecycle adjustments that should allow Machine Intelligence model implementation and deployment to adapt to changing behaviours ‘online’ and not become obsolete over time, or at least not as quickly, before they require another round of ‘offline’ training and redeployment.

AMPLabs Velox Example

The examples given (for instance above for a music recommender system) are relatively constrained but show the power of this to not only make model lifecyle management more efficient, but also help drive the creation of applications that will rely not only on multiple or chained models, and thus a higher degree of complexity in terms of model lifecycle management, but also where models involve radically different data types or behavioural focus, which I’m going to look at later. And all at speed!

Visualising Machine Learning Algorithms: Beauty and Belief

This post looks at algorithm visualisation in two guises, first in terms of ‘how’ an algorithm does its work – looking ‘inside’ the box if you like, and second on what comes out the other end as an output or outcome. I’ve written this spurred on initially by a recent high-media-profile classification or labelling error (see below), and partly to get something out on visualisation, which is important to me.

Visualisation is a nice ‘hot’ topic in many domains or senses, and worthily so. The ubiquity of the web, and technologies associated with this, has brought what was previously a more ‘arcane’ discipline mixing statistics with design in to the wider world. The other key concept in visualisation is that of narrative, and the concept of time or time series behind this. As a historian by training, rather than a statistician or designer, I personally of course like this bit too. The way in which data ‘stories’ or narratives can be constructed and displayed is a fascinating and organic process. The world of TED is relevant to any domain, or discipline, where data and communication or insight around this is involved.

The two words that I want to concentrate on here are beauty, and belief. They are closely related.

Visualisation is an important communication tool, and can often be ‘beautiful’ in its own right. This applies to something that helps us understand how an algorithm is doing its job, step by step or stage by stage, and also to what comes out the other side. Beauty (or elegance) and function are often aligned closely, so in this case what looks good also works well.

Visualisation is also an important component of ‘testing’ or validating a process and an output, either in terms of the development process and what is working in what way when or how, or in getting a client or partner who is meant to use the output in an application to buy in to or accept what is going on behind the scenes. So we have to Believe in it too.


I like a pretty picture. Who doesn’t? And ones that move or you can interact with are even better. I’ve read some, but by no means all, of the ‘textbooks’ on data visualisation, from ‘classics’ like Tufte to the work of Steven Few. I’ve worked in my own way in visualisation applications (for me, mainly Tableau in recent years) and in conjunction with colleagues in D3 and other web technologies. Most of this has been to do with Marketing and ‘Enterprise’ data in the guise of my role in Sports Alliance. This is not the place to showcase or parade my own work, thank god. I’m going to concentrate firmly on paradigms or examples from others. This section will be quite short.

It’s easy to say, but I do love the D3 work of Mike Bostock. The examples he generates are invariably elegant, sparse and functional all at the same time. D3 works on the web and therefore potentially for anyone at any time, and he releases the code for anyone else to use. They also really work for me in terms of the varying ‘levels’ of understanding that they allow for audiences with different levels of mathematical or programming knowledge. The example below is for a sampling approach using Poisson Discs:BostockPoissonDiscII

This next is for a shuffle. What I like here is that the visual metaphors are clear and coherent – discs are, well, discs (or anuli), and cards and shuffling (sorting) go together – and also that the visualisation is ‘sparse’ – meaning is clearly indicated in a ‘light touch’ with colour, sparingly used, shade, shape and motion in terms of a time series or iteration steps.


The next example is another D3 by a team related to exploring the relationships between journal articles and citations across 25 years and 3 journals or periodicals. Its sorted by a ‘citation’ metric, and shows clearly which articles have the most ‘influence’ in the domain.


The body of work across 3 decades represented by the scientific visualisations in the IEEE Vis events and related journals InfoVis, VAST and SciVis the exhibit above represents is breathtaking. I’ve chosen two examples ‘stolen’ below that have a strong relation to ‘Machine Learning’ or Algorithm output exploration, which serves to segue or link to the next section on ‘belief’.Viz2015Example


Both these are examples of how a visualisation of the output of an algorithm or approach can also help understand or test what the algorithm, and any associated parameters or configuration, is actually doing, and therefore whether we ‘believe’ in it or not.


In our work in Sports Alliance, we’ve struggled at times to get clients to ‘buy in’ to a classifier in action due partly to the limitations of the software we’re using for that, and partly down to us not going the extra mile to ensure complete ‘transparency’ in what an algorithm has done to get the ‘output’. We’ve used decision trees mostly partly because they work in our domain, and partly also because of the relative communicative ease of a ‘tree’ to demonstrate and evaluate the process, regardless of whatever math or algorithm is actually behind it. What has worked best for us is tying the output of the model – a ‘score’ for an individual item (in our case a supporter churn/acquisition metric) – back to their individual ‘real world’ profile and values for features that the model utilises and has deemed ‘meaningful’.

I’ve not used it in production, but I particularly like the BigML UI for decision tree evaluation and inspection. Here is an example from their public gallery for Stroke Prediction based on data from Michigan Stage University:


Trees and branching is an ‘easy’ way or metaphor to understand classification or sorting. Additional information on relative feature or variable correlation to target or ‘importance’


The emergence of ‘Deep Neural Nets’ of varying flavours has involved a lot of these themes or issues, particularly in the area of image classification. What is the ‘Net’ actually doing inside in order to arrive at the label category? How is one version of a ‘Net’ different to another, and is this better or worse?

I like this version presented by Matthew Zeiler of Clarifai in February this year. I don’t pretend to follow exactly what this means in terms of the NN architecure, but the idea of digging in to the layers of a NN and ‘seeing’ what the Net is seeing at each stage makes some sense to me.


The talk then goes on to show how they used the ‘visualisation’ to modify the architecture of the net to improve both performance and speed.

Another approach that seems to me to serve to help demystify or ‘open the box’ is the ‘generative’ approach. At my level of understanding, this involves reversing the process, something along the lines of giving a trained Net a label and asking it to generate inputs (e.g. pictures) at different layers in the Net that are linked to the label.

See the Google DeepMind DRAW paper from Feb 2015 here and a Google Research piece from June 2015 entitled ‘Inceptionism: Going Deeper into Neural Nets’ here. Both show different aspects of generative approach. I particularly like the DRAW reference to the ‘spatial attention mechanism that mimics the foveation of the human eye’. I’m not technically qualified to understand what this means in terms of architecture, but I think I follow what the DeepMind researchers are trying to do using ‘human’ psychological or biological approaches as paradigms to help their work progess:


Here is an example of reversing the process to generate images in the second Google Research paper.


This also raises the question of error. Errors are implicit in any classifier or ‘predictive’ process, and statisticians and engineers have worked on this area for many years. This is now the time to mention the ‘recent’ high profile labelling error from Google+. Dogs as Horses is mild, but Black people as ‘Gorillas‘? I’m most definitely not laughing at Google+ for this or about this. Its serious. Its a clear example of how limited we can be to understand the ‘unforeseen’ errors and the contexts in which these errors will be seen and understood.

I haven’t myself worked in multi-class problems. In my inelegant way, I would imagine that there is a ‘final’ ‘if … where…’ SQL clause that can be implemented to pick up pre-defined scenarios, for example where the classification possibilities include both ‘human’ or ‘named friend’ and ‘gorilla’, then return ‘null’.

The latitude for error in a domain or application of course varies massively. Data Scientists, and their previous incarnations as Statisticians or Quants, have known this for a long time. Metrics for ‘precision’, ‘recall’, risk tolerance and what a false positive or false negative actually mean will vary by application.

Testing, validating and debugging, and attitude to risk or error are critical.

A few years ago I worked on a test implementation of Apache Mahout for Product Recommendation in our business. I found the work done by Sean Owen (now at Cloudera as Oryx became Myrrhix) and Ted Dunning and Ellen Friedman both now at MapR particularly useful.

Dunning’s tongue-in-cheek approach amused me as much as his obvious command or dominance of the subject matter impressed and inspired me. The ‘Dog and Pony’ show and the ‘Pink Waffles’ are great ‘anecdotal’ or ‘metaphorical’ ways to explain important messages – about testing and training and version control, as much as the inner workings of anomalous co-occurence and matrix factorisation.


And this on procedure, training and plain good sense in algorithm development and version control.

DunningFriedmanRecommenderTrainingIn our case we didn’t get to production on this. In professional sport retail and the data we had available there wasn’t very much variation in basket item choices as so much of the trade is focussed on a single product – the ‘shirt’, equivalent to the ‘everybody gets a pony’ in Dunning’s example above.

Machine Learning, Professional Sports and Customer Marketing in the UK and Europe

Professional Sports Customer Marketing is driven primarily by two key product lines or revenue areas – subscriptions or memberships, and seat or ticket products. The two are ‘combined’ for the classic or ever-green ‘Season Ticket’ packaged product that is essentially the first tier in a membership programme, and on which other ‘loyalty’ programmes or schemes can function.

This post looks at the application of ‘Machine Learning’ in the form of both supervised and unsupervised methods to Customer Marketing in Professional Sport.

I’ll start with an example of an unsupervised approach, using a ‘standard’ k-means algorithm to identify clusters of Professional Sports Club Customers based on features or attributes that describe as broadly as possible Customer profile or behaviour over time. These features were built or sourced from an underlying ‘data model’ that looks at the following areas broadly

  1. Sales transactions – baskets or product items purchased by a customer over time, broken down by product area in to Season Tickets, Match Tickets, Retail (Merchandise), Memberships and Content subscriptions
  2. Socio-Demographic – relating to individual identity, gender, geography and also to other relationships to other supporters in the data o
  3. Marketing Behaviour – engagement and response to outbound and inbound marketing content over time

We wanted to create a ‘UK Behavioural Model’ that would be representative for UK Sports Clubs, so we created a sample in proportion to overall Club or client size from a set of 25 Clubs in the UK from Football, Rugby and Cricket. The sample consisted of 300k from an overall base of approximately 10 million supporters. The input or feature selection was normalised for all Clubs. We experimented with different iterations based on cluster numbers and sizing.

The exhibit below shows a version with 15 different clusters, numbered and coloured across the first row from 0-14. The row headers are the different features or feature groups. The cluster colours are persisted throughout the rows for each feature. The horizontal ‘size’ of the bar for each cluster in each row is the average per customer for each feature. The width or horizontal size of the bar for each cluster in each row relative to the first row for size is intended to provide a visual guide to differences between clusters.


Revenue £££s in the bottom 3 rows is generated from a handful of clusters only:

  • Purple 8 and Mauve 9 dominate Ticketing revenue
  • Red Cluster 6 and Mauve 9 dominate Memberships revenue
  • Grey Cluster 14 contributes to Merchandise revenue

Pretty much of all of the ‘NonUK’ supporters have been allocated to Light Blue Cluster 1.

Interestingly, Gender (M/F) or Age (Kids, Adults, Seniors) don’t seem to discriminate much between Clusters. See the exhibit below that plots ‘Maleness’ on the Y axis and ‘Age’ on the X axis.


Cluster 4 is ‘Old Men’, Cluster 14 is ‘Ladies of a Certain Age’ but the majority of Clusters (circle diameter proportional to size or number of Customers) aren’t really discriminated by these dimensions or features. We concluded that behaviour of kids ‘followed’ or emulated adults in terms of key features for attendance and membership.

The next section looks at a supervised approach, using a decision-tree classification algorithm to identify ‘propensity’ for members or subscribers to renew or churn, and then the converse for non-members or subscribers to ‘convert’ to become a member, based on similarity to previous retention or acquisition events.

Our work in this area began tentatively in 2010 using an outside consultant (Hello Knut!) from a large software vendor working on a single project membership churn issue in a large Northern London football Club. In the course of the past 5 years, we’ve taken the approach ‘in house’ and ‘democratised’ this in a certain way and applied the techniques over-and-over to different clubs (Hello Emanuela!). We’ve tried to make this as efficient as possible by engineering a common feature set across all clubs and seasons based on a common data model.

For the retention model, we’ve continued to build and train a model for each club AND for each season, as we saw a greater predictive or accuracy over time, also based on including features that encapsulated ‘history’ for each customer up until that season as fully as possible.

For the acquisition model, we have modified the approach slightly using a single input of all acquisition events regardless of season, but still only one club at a time. This was based on the belief or the observation that people became members for roughly the same reasons regardless of season, whilst people ‘churned’ from becoming a member to a non-member based more on season -to -season performance and issues.

Decision trees are often cited as being on the more ‘open’ or ‘transparent’ end of the scale of classification techniques or approaches. However, we’ve succeeded in operationalising the retention model in to Club Sales and Marketing systems and programmes only by using the features or variables that ‘float to the top’ of the decision tree to construct a ‘risk factor’ matrix based on observed ‘real’ behaviour and change in these over time for the customer.

Here’s an example of feature or variable correlation for the STH Acquisition model:


What was particularly interesting here was the importance of the ‘Half Season Ticket’ holding in the previous season and then the ‘groupings’ represented by the other Half Season Ticket holders who lived with the same supporter. This points very clearly to the inter-relationships that we ‘know’ are important between individual supporters, and leads us towards a more Graph-based analytical approach to identify and analyse relationships at play at specific points in the customer life-cycle, life-stage and buying relationship with the Club.

Our industry or sector is still dominated by the ‘Season Ticket’, a ‘hero product’ that continues, like fine wine or an ageing Hollywood A-lister, to defy the years and live on to snaffle a majority of our clients time and attention and share of revenue. The more that we can do to understand the ‘patterns’ behind this, the better.

Stadium Migrations – a ‘Once in a Lifetime’ Journey

Professional sports is, as I’ve written before and will probably write again, a funny old business. The customer is not a customer but a supporter. The Club is a place where the sporting spectacle unfolds, live and un-intermediated, never to be repeated but available, subject to sporting competition format and culture, once a week or so on average if you missed the last one. The bond or affiliation between the supporter and the Club runs deep and is, generally, exclusive and for life, forming an important component of social and individual identity. Supporters are in general a loyal and tribal bunch. They wear the shirt, they sing the song, they share in the journey, and many choose to get married, divorced and have their ashes scattered in the ‘temple’ and centre of activity – the Stadium.

I’m not, as you may have already perceived, a particularly strongly affiliated sports fan or supporter in any private way. Which is lucky, as the company I work in has over 50 individual or separate clients and I feel morally and socially unequipped to deal with the levels, tiers and currents of emotional attachment, reattachment and guilt that such manifest polygamy might entail.

I’ve been privileged, however, to participate in my own way in a number of ‘Stadium Migrations’ in my time at Sports Alliance. ‘Stadium Migration’ refers to the process undergone when a Club chooses, whether by accident or design, to move home from one location to another. This need not be a significant geographical translation – many Stadiums are re-developed ‘in situ’ or adjacent to the original. But whatever the distance, the process is, for most supporters and the Club staff involved, a ‘Once in a Lifetime’ experience.

Clubs choose to ‘migrate’ for a variety of reasons. The old place is looking tacky, falling down, not big enough or not the right balance in terms of levels of product and service they can offer, or it may have been only a ‘temporary’ home due to external circumstances and a short-term sharing arrangement has come to an end. This side of the Atlantic the ‘Club’ at least will be an original organisation with probably over 100 years of history, culture and attachment. They don’t ‘invent’ Clubs, at least as a rule, in Europe as they do in a franchise system. So for the supporters involved it is a ‘migration’ from one place to another, the focus is on the existing supporters seamlessly and not on attracting any new ones. Here at Sports Alliance we do data and marketing, so we’re involved in the ‘Customer Experience’ and very definitely nothing to do with concrete, steel, or grass.

My ‘first’ migration was in many senses the biggest, for Arsenal in the move from Highbury 1 or 2 miles down the road to what is now The Emirates in North London. Here is an example from the 2005 Brochure for allocation of seats to existing supporters.


Since then I’ve worked on or am currently involved in projects for RCD Espanyol, Athletic Bilbao and Barcelona in Spain, Swansea City in Wales, and Saracens Rugby, Tottenham and West Ham in London. Each of these is, of course, different to each other and unique in some way or another. Here I want to talk about the shared or in common elements.


What’s involved? First, we look at Ranking, Relationships and Relocation…

Virtually all migrations we’ve worked on involve creating a ‘Queue’ based on a Ranking system. The Queue needs ordering or ranking according to some agreed criteria – how long you’ve been a member for, how much money you paid for your existing or old seat, whether or not you have an interest or shareholding where applicable, or a more genteel approach so that elders, or women and children, go first.

Relocation is related to the physical location of a seat occupied as part of a season ticket or membership product. This often involves animated discussion and analysis of ‘equivalency’ between two very often very different stadium layouts. These can differ massively according to the number of seats, and more importantly the specific layouts according to stadium design for tiers, blocks and rows, and where exits and other stadium features interrupt or not in both ‘old’ and ‘new’ locations.

Here is an example of a stadium layout that shows ‘missing’ seats from an old to a new stadium – seats that ‘disappear’ when a notion of equivalence has been applied. Blue shows seats that do not have directly equivalent seats, here mainly due to the different layout of the aisles and ‘vomitorios’ or exits.


The criteria decided for individual entitlement or ranking must also work for ‘groups’ of supporters who choose to apply or sit together, either so that new people are encourage to apply and join in or so that existing supporters do not feel that their entitlement is being diluted. Groups may be inferred – based on inference on existing sales or behavioural data for seating or family. Here is an example of a stadium layout representation looking at Age Band. The colour scheme or pallette has been chosen to represent ‘age bands’, where blue is ‘youth’, green is ‘adult’,  red ‘senior, purple the transition between youth and adult. For each age band, a darker shade or hue is intended to show increase in age.


The next example shows an additional dimension of ‘grouping’ based on inferred relationships between adjacent seated customers. The groups are based on grades of proximity or closeness of relationship – from degrees or probability of family relationships to neighbourhood location.


Lurking behind any ‘public’ criteria of ranking will be a Club’s opportunity or desire to ‘up sell’ customers to premium products, if they so desire.

How long does it take? From a few months to a few years… The longest from the list above completed was for Arsenal, taking place over a two year period in the run up to the Emirates opening. The shortest has probably been Athletic Bilbao, which took a turn for the worse or better when the Club shortened the timeline to the coming September when they realised that the physical build was advancing sufficiently. Here is a view of the shiny new San Mames stadium from across the river.

San Mamés 031 p

Once the ‘rules’ for customer or supporter engagement have been agreed, its then time to turn this in to a sales and marketing plan. Because this involves a seat selection in what is often a non-existent venue, this almost always involves a face-to-face visit in a specially constructed or administered sales centre, allowing for the supporter or group of supporters to confirm their choice of seat and for reservations and deposits to be taken at the same time. It’s also important as an ‘experience’ that for many is cherished and downright emotional. At Bilbao, for example, the ‘eldest’ socio (member or season ticket holder) was ceremoniously invited to begin the process. Cue flag waving and tears, and for good reason!

Our role here is also to prepare and furnish a sales and appointment management system that interfaces with existing Club systems for ticket or seat booking and reservation and any marketing or internet applications for communication or grouping assignments. We usually do this in a customised version of an ‘off the shelf’ CRM system such as Microsoft Dynamics, using where possible existing functionality for contacts, groups, service resources and service appointments, and the marketing list and outbound communications that result from this.

We’re currently in the process of planning for 3 new proposed migrations in the UK and Europe, so more will follow.

Marketing at Speed in Professional Sport

We’ve recently, as in the last six months recently, completed our first ever pair of ‘Enterprise Software’ partnerships. Like London buses, you wait ten years for one then suddenly two come along at once .. etc. etc.

One partnership is with Adobe Campaign, a Marketing Automation solution, and another with Tibco for their Business Works  Enterprise Application Service and Messaging product. Both are envisaged over the same sort of term – three years or more, and both are at the heart of a new effort to provide ‘Enterprise Class’ sales and marketing services to our clients. Together these represent a combination of solutions to handle both data integration and end-to-end processes for ‘Marketing at Speed’ in Professional Sport.

Tibco helps connect the dots and provide ‘real time’ data.

Adobe Campaign provides an automation layer for marketing communications that are personalised, timely (thanks, Tibco) and relevant.

Use cases focus on two areas – ‘Bricks’ – for in stadium / on location, and then ‘Clicks’ for ‘the internet’.

For ‘bricks’, we need to know when a supporter is ‘present’, and be able to identify who they are at the same time. In the ‘real world’ this relies on Access control for perimeter access notification and then ‘loyalty’ schemes for EPOS / Till systems linked to a membership card.

For ‘clicks’ we’d like to know when a customer signs in or downloads an app and links this to their membership. In principle an app can also access GPS for physical location, or go further in-store for NFC or iBeacon level of location information.

From them on Tibco helps applications exchange information in real-time, and Adobe furnishes these applications with the right marketing content that is relevant to the individual supporter and the ‘context’ – including the club’s offer or promotion catalogue.

Behind both is the Sports Alliance ‘data model’ that powers both the identification of the supporter and their history with the Club.

Its been ten years in the waiting so its worth explaining the background to this. We’re a funny kind of company working in a funny kind of market.

First, the market. Its fragmented, disjointed and, some of the times, possibly schizophrenic… and underlying this limited in terms of resource and expertise required to tame software and data for ‘enterprise class’ marketing.

To the average observer or ‘supporter’ – the generic term for customer in the industry – a Professional Sports Club looks shiny and big from the outside, with extensive media exposure, a smattering of celebrity and a generous and tangible physical asset with a rectangle of grass sitting in the middle. Commercially, however, they tend to be both extremely fragmented and yet also extremely limited in capabilities in many areas.

The direction and flow of money through the business is public knowledge, the majority inflow coming from higher-level media or sponsorship deals intermediated by leagues or other bodies exclusively designed to market and sell these rights, and flowing down and out directly almost straight through to the players, and their agents, who provide the product.  The infrastructure and organisation remaining inevitably has to focus on what it takes to underpin the ‘main event’ – the matchday – and the other ongoing customer-focussed businesses such as retail, hospitality, or community struggle for attention and resources from what is left over.

The marketing side of the business is primarily all about retention, structured around subscription or membership products (season tickets), and focussed on monetising the big concrete and steel asset with grass in the middle 2 or 3 times a month. In these terms, even the ‘biggest’ brands are local or regional at best in terms of the customer base who will come to the venue and experience the product and pay for it on a regular basis.

Secondly, then, back to us. Sports Alliance has been defined by the clear tension between these realities or limitations on one side and the expectations and ambitions on the other, spread and shared across many individual clients. As a guide to our client network, we have over 50 individual clients. In terms of size, the best proxy is unique customers or supporters under management. Our largest clients will have more than a million, the smallest down to a few tens of thousands. Pooled together we have nearly 15mm customers under management from the UK and mainland Europe. And yet both these ends of the scale, largest and smallest, are in the same business, with the same supporter needs or expectations, and the same ‘customer experience’ to try and deliver.

Sports Alliance as a company has been formed and evolved around these twin realities – part technology, part services, providing an extremely broad set of functions and applications from ‘back end’ systems integration and marketing data warehousing, to ‘front end’ multiple line of business sales and marketing applications or solutions, and then a further layer of services on top for clients that also want them. Within all of this is a focus on a data model that makes sense and can be replicated and evolved within and between clients.

For the ‘back end’ technologies, until now we’ve been firmly in the ‘build your own’ camp, based primarily on Microsoft technologies – SQL Server in the middle, with some C# /.NET and web tech from ASP to MVC and WCF SOAP/XML. ‘Front-end’ is more hybrid, split between ‘lower level’ partnering and some more of the ‘build your own’. Where the market has evolved in terms of SaaS providers we’ve partnered or integrated with the obvious leaders, for CRM this means Dynamics or Salesforce, and for ESP a broader set of suppliers, and for BI Tableau. In any of these, we find ourselves primarily concentrating on data schema and configuration required for replicated a data model and then letting the standard application functionality work around this. In some areas, for example Marketing Campaign Management and Loyalty, we’ve carried on and built ‘Front-end’ completed solutions, mainly down to the fact that 3rd party products in these areas were either too expensive or not fully fit for purpose.

We’ve watched as in recent years occasionally one or two of the larger clubs in our universe have taken a direct sales route to ‘enterprise’ software and solutions, often as part of a sponsorship package, and we believe it would be fair to say that the success rate or return on investment where this has happened would be marginal or arguable at best. The skills and resources required not only to implement but then maintain and evolve these solutions are often underestimated.

Anyway, enough about us or ‘me’. It takes two to tango, or to do a deal, and ‘Enterprise Software’ vendors had, in our niche, remained resolutely ‘Enterprise’ in approach to this. As noted above, where there was the possibility of a ‘big catch’ with a traditional, direct capital expenditure model, the software sales team would carry on in this vein if there was any possibility of a good old fashioned commission on sale. Build it, market it, get one or two ‘reference’ big fishes, and surely they will come.

Sports Alliance, even in pooling together networked resources, has financially been unable or unwilling to go for anything resembling a traditional ‘Enterprise’ solution licence independently. Pooled together as 15mm customers we look even more suspiciously like an ‘Enterprise’ and any traditional licence would in itself be many times the multiple of our total company turnover.

So, what’s changed? Essentially, it looks like we’ve met in the middle.

First, the Enterprise software approach has, at least in our case, shifted towards a more accessible quarterly subscription model, billed in arrears, and one layered to allow for a ‘slow start’ with clear tiers going up as business or usage increases or grows. And that was probably the most significant and necessary component. I spent a number of months with another Enterprise Application vendor who offer a subscription model, but billed only it turned out yearly in advance. Its hard to fund that in our business. Why has his changed? Overall, I believe this is a realisation that the ‘Enterprise’ market model simply doesn’t work outside in the world of SME/SMBs, and some money for software that has very little other cost of sale is simply better than none.

Secondly, Sports Alliance has to recognise that we can’t continue to spread ourselves so thinly and improve on what we do for our clients significantly. We’d continue, as one client said to me recently, to be ‘stuck in third gear’ (a metaphor that will rapidly become meaningless with self-driving cars. I presume he meant a 5 speed car gearbox as well, and not a 10 speed rear cassette on a bicycle).

The use-cases we’re looking at focus on ensuring that the ‘full’ customer profile and history this represents is visible in two massively important areas – when the customer is on site, and at the other end on the internet. Remember, we’re fortunate to work in an industry where the nature of the customer affiliation is such that the more we can show we know about them, the better. Other brands might look spooky or raise concerns of privacy when showing knowledge over a customer relationship that can span decades of sales history, and involve close personal and family relationships across generations.

For on-site, this revolves around the ‘real-time’ physical presence of the customer at the stadium or venue, and ensuring that the customer experience is as delightful as it could possibly be. ‘Real-time’ is difficult to achieve in an industry that is heavily siloed or disparate in systems and applications, and where integration tends to be batch based. We need to identify the location of the customer onsite either by access or entry to the perimeter or at a point of sale via a loyalty scheme or programme identifier.

For the internet, there is also the additional hurdle of identification in an application in a market where many Clubs do not control their own rights or properties and where Ecommerce partners will have separate, standalone applications for sales. Single Sign On, and the opportunity to treat a customer consistently across different applications or touch points, is still very much a work in progress.

So, once identified and we know where they are – in stadium/shop or in an application – its over the marketing department and the campaign or offer catalogue

  1. Personalised in-store loyalty promotion
  2. In application games ‘points boost’ for behaviour
  3. Prompt to redeem loyalty points in newly opened hospitality area exclusively for Season Ticket Holders
  4. Special event local to supporter’s on following weekend for Soccer Schools participants from previous year who have had a birthday party at Club
  5. And so on…

Applying AI – A More Detailed Look at Healthcare Diagnostics

This is intended to be a slightly more detailed look at a single vertical or domain – Healthcare, and within this the single area of diagnosis support, and how ‘new AI’ in different guises is being applied, and by whom, and in what way, and to what end.

Its OK to say that the potential for a ‘marriage’ or liaison between Healthcare and AI is no closeted secret. Healthcare is large, complex, and deeply encased in centuries of knowledge and empirical reasoning. The network relationships between physicians or doctors, institutions, patients, treatments and outcomes are a barely understood global resource of great potential value. The whole is too large for any single human to encompass. Inefficiencies or discrepancies in diagnosis and outcomes are inevitable. The potential for new technologies to help to disrupt and reshape the healthcare market and the diagnosis process is clearly understood. VCs are also clearly interested in the outcome, whether commercially or philanthropically; a good example is Vinod Khosla and the ventures his firm represents, from Lumiata (see below) to Ginger.io and CrowdMed.

Healthcare is of course a universe in itself. Diagnosis, Prescription, Monitoring, Intervention – each area or subsection has its own challenges, contexts and actors involved. The potential for ‘universal’ and non-invasive monitoring or sampling tools and applications is of course enormous in itself. The forecast explosion of consumer data-creating devices and applications is going to create a ‘stream processing’ event orders of magnitude beyond what exists currently. As stated, I’m going to try to concentrate here on Diagnosis, and on software rather than hardware, in the form of ‘expert systems’ to support or guide human decision making.

One place to start is with the ‘who’ rather than the ‘what’ or the ‘how’.

The IBM Watson ‘cognitive computing’ project has a valid claim to be an early starter, and also to be in the forefront of many peoples minds with the heritage of the ‘Deep Blue’ project and on to the 2011 ‘Jeopardy‘ demonstration and subsequent publicity generated. Back in the real world, Watson is now applied as solution as a ‘Discovery Advisor‘ in different domains – including healthcare for clinical trial selection, and pharmaceutical drug development. It’s an approach that is both ambitious and intensive – involving many years of intense R&D and the costs associated, and the partnerships with leading Physicians and Institutions including Cancer and Genomic research for ‘training’ over as many years on top of or outside of this. Outside of healthcare, the ‘Question and Answer’ approach is merged with other IBM product lines for Business Analytics and Knowledge Discovery. The recent acquisition of AlchemyAPI, a younger, nimbler technology and ‘outside focussed’ by its very nature, should integrate well to the Bluemix platform. The example below is from IBM development evangelist Andrew Trice, with a voice UI now for a Healthcare QA application:

Whilst I admire the ambition (whether commercially driven or not) and the underlying scale of Watson, I may question if the ‘blinkenlight‘ aura generated by the humming blue appliance linked then to a ‘solutions and partner ecosystem’ notorious for tripling (or more) of any proposed budget will lead to a true democratisation. I feel the same unnatural commingling of awe and fear in response to the Cray appliance use cases. Amazing, awesome, and yet also extremely expensive. I guess the alternative – commodity hardware run at scale using a suitably clever network engineering process to distribute computation and process results- doesn’t come cheap either.

Also, I understand and concur with the need for ‘real stories’ that publicise and demonstrate an application in a way that the ‘average Joe’ can understand. (My personal favourite is the Google DeepMind Atari simulation – more elsewhere on this.) Some attempts, however well intentioned, simply don’t work, or at least in my opinion. The Watson-as-Chef and Food truck for ‘try me’ events makes me think ‘wow, desperate‘ rather than ‘wow, cool’.

The fast-paced improvement and application of ‘Deep Learning’ Neural Networks in image classification have opened up a new opportunity in Medical Image analysis. Some ‘general purpose Deep Learning as a Service or Appliance’ companies  such as  ErsatzLabs offer their tools as a service, and include Healthcare diagnosis use cases in their portfolio.

Enlitic.com proposes to offer a more ‘holistic’ approach combining different technologies and approaches – here both Deep Learning for Imaging and intriguingly combine this also with NLP and semantic approaches for healthcare diagnostics.

Lumiata‘s approach appears more graph-driven, ingesting text and structured data from multiple ‘sources’ from insurance claims, health records, and medical literature, creating an analytics framework for assessing or predicting patient ‘risk’, and exposing this as a service for other healthcare apps.

Its also worth mentioning Google, a potential giant of any domain if desire exists, who have already made a move in to health, leveraging their dominance in search and status as ‘first point of call on the internet’ to provide curated health content, including suitably gnostic pronouncements on search algorithm ‘tweaking’ to support this curated health service.

In terms of diagnosis and treatment, the ‘data types’ currently being referenced are essentially images, text or documents, including test results, and relationships. The technical approaches applied map closely to these – Deep Learning Nets for classification of imaging, NLP / XML for semantics, ontology and meaning in unstructured documents and text, and Graph Analytics at scale for the complexity of the ‘web’ of patient-doctor-diagnosis-disease-treatment-outcomes.

Two of the example companies discussed here – IBM Watson and Cray – have a heritage in the high-end (read expensive) appliance or super-computer systems architecture for running highly memory and processing intensive real-time analytics at scale, and the expensive hordes of suited consultants to implement, deploy and manage these solutions over time. The other, newer, smaller ventures show mixed approaches, and, although its early days for any publicly available data, I would assume on a more ‘flexible’ commercial basis.

So what’s the big story? Stepping back and looking down, healthcare and the data it consists of seems to me a big ‘brain’ of information constructed from different formats and substances, but linked together in complex relationships and patterns hidden or obfuscated by barriers of format and location or access. This is traditionally referred to as the ‘real world’, whether its Healthcare or the Enterprise.

The goals or objectives can be simply phrased – improving and optimising patient outcomes, and placing the patient at the centre.

The adversarial paradigm of ‘bad AI’ of rapidly-evolving software systems ‘competing’ against physicians  in a winner-takes-all for the right to diagnose and treat patients is of course naive. And yet the healthcare industry is clearly labelled and targetted up for a ‘disruption’ in the coming decades in terms of who does and is responsible for what. Whatever this ends up looking like, we can be sure it will be radically different to the way it is now.

Its a big, big challenge. No one venture – even at the scale of Google or IBM – is going to do this by themselves. Its going to rely also on a host of smaller ventures, but ones with inversely large ambitions.

Programmatic – Online, TV and Beyond?

I once worked as a media planner in an small digital agency in Central London (queue fairy tale intro…). I’ve been speaking to ex-colleagues who still work in the advertising and media world recently to catch up on all of the interesting things that have changed. Chief among these is programmatic or ‘real time’ advertising, and the flip side of the remaining intransigence or resistance of the TV market to allow itself to be infiltrated by this invidious new practice.

I know first hand the massive inefficiencies and vested interests at play in the ‘traditional’ media planning and buying process, where lunches, personal relationships, buying commission and the surreal practicalities of panel-based viewing as attribution mechanism, would only occasionally be interrupted by excel or the econometrics department when wheeled in with a chart. (This is intentionally facetious and exaggerated, and yet….). At the same time I’ve heard first hand at how suitably robust and acceptable indirect attribution can and is being produced between budget and spots (advertising on TV) and response (primarily digital or online in terms of ecommerce or customer acquisition).

What isn’t there yet is any ability to link the individual – the ‘user’ or ‘customer’ – across channels or media in a suitably seamless manner. This much anyway is familiar to me in my Sports Alliance role.

I’m writing this explicitly not as any market expert, but in the example this represents for the application of machine or software-driven decision making in a large and yet still immature market area or space. Particularly one dominated by the online web giants – Facebook, Yahoo, Google – and the resources they have invested in for their own ‘Artificial Intelligence’ and how this is or will be integrated to their product or application stack.

The RTB exchange system, and its continuing evolution to become a fully-fledged trading floor with supporting mechanics for futures/options, is here to stay, thank god. I’ve seen some of the numbers from friends for the number of transactions or queries per second the market as it currently stands can process or accommodate and its impressive, or at least it is to me anyway.

What I’m less sure about is how ‘intelligent’ the system is in its ability to predict the desired event (click) or allocate or optimise resource (advertising, of different formats, and budget related). I’ll come back to this later.

Deep Learning: Machine Learning becomes ‘Intelligent’, Artificially or Otherwise?

‘Deep Learning’ is the new big thing. ‘Old Fashioned’ Academic AI from the second half of the C20th has passed through the last decades of a ‘narrower’ focus of Data Mining or Knowledge Discovery and Machine Learning to blossom in to what is now touted as a new, vibrant age.

There is too much here to comment on in anything like a sane manner, and this is intended to be a very brief summary of my own ‘non-expert’ position, so I’ll be very brief with a few examples:

  1. The internet and technology giants are splashing cash on ‘Artificial Intelligence’ or ‘Machine Intelligence’, see Yahoo, IBM, Microsoft, Google and Facebook. That much money can’t be wrong, can it?
  2. Where the giants have trod, the VC world has followed. See this single VC post for the ‘Machine Intelligence’ landscape here from the end of 2014. They all can’t be wrong, either, surely?
  3. Goverments have ‘quietly’ been doing their own thing anyway for security, military, and logistical purposes anyway regardless of the public involvement or awareness. See the FBI NGI here, or anything that DARPA funds.

What’s also interesting is the emergence of the ‘old school’ academics among the individuals who are leading this ‘new’ (read ‘old’!) era. This is I believe down to the fact that the skills and knowledge required to be a ‘master’ in this area are intensely academic and rare in themselves, and that the ‘Deep Learning’ technologies that are currently being worked on have a continuing evolution from the early Neural Nets through to the Convolutional or ‘learning’ approaches that represent the ‘state of the art’ today. See first Hinton, LeCun, Ng as three eminent examples who have been ‘appropriated’ or ‘acquired’ or ‘assimilated’ by commercial operations. Ng’s website is the ‘outlier’, the others are gloriously and happily ‘old school’. Demis Hassibis‘s journey to Google is slightly different – not a mainstream academic at all but games developer- but with his skills he could have been. Other leading research figures in the last decade or so include Bengio, Bottou, Ciresan. See any footnotes in published research or NIPS for further lists of individuals.

As a one-time academic of sorts, it is or will be fascinating to see how this ‘cohort’ of researchers react to ‘suddenly’ being thrust in to the spotlight of a broader and more commercial world.

Moving on from this aside, it is worth pointing out that it is a new, vibrant age. The reinforcement loops at play particularly now with the internet / webscale technology giants and their real-world need for ‘intelligent applications’, at speed and at scale, has shifted the landscape and a few paradigms with it.

One  other clear outcome at play in recent years has been the inflation of ‘Data Scientist’ wages, perhaps long overdue, and the related ‘talent search’ or ‘fill a room with Machine Learning PhDs and wait for acquisition’ approach to company formation. Machine Learning or Data Science qualifications and the courses offered to support these by existing academic institutions or online at Coursera are I would imagine highly prized.

I’m going to write separately on particular areas or approaches of interest.

Graph Relationships – Social and Otherwise – in Professional Sport

Good ‘old-fashioned’ Customer Relationships are important in the sector that I work in – Professional Sport. Usually this focusses on the relationship between the brand / venue or club and the customer or supporter (note the flexible terminology here). Retention is core in an industry where the affiliation is often ‘for life’, and the concept of churn is interpreted as gradations or levels or involvement and not as ‘defection’ to a rival brand, so loyalty and the recognition of this by the club in relation to their customers is massively important in reinforcing the ‘tribal’ relationship the supporter feels to the club in question.

Lately, we’ve been doing a lot more analysis of relationships between supporters themselves, more along the lines of good ‘new fashioned’ social or graph relations. This has been driven partly by some of the projects we have been working on and their requirements – particularly large scale ‘once in a lifetime’ stadium migrations – and partly by the availability and access to the technologies that enable this. Examples here include the underlying technology such as SPARQL / RDF, as well as the ability to communicate or visualise this in tools such as D3 or Tableau.

What it doesn’t mean yet is ‘social media’ in terms of facebook, twitter or other mainstream consumer products. We’ve been hampered here by the continuing limitations on many club’s ability to link or identify a social media ID with a real-world ID – digital or otherwise. I’ll comment on this elsewhere in more detail.

In terms of Graph Analytics, we’ve been particularly interested in the patterns that represent groupings or shared activities that help re-reinforce the ‘real world’ relationships that are otherwise hidden to a club. Examples in the real world (bricks not clicks) would include who sits next to whom in a stadium seating plan, who arrives with whom and at what time in terms of stadium or venue access, and what we can infer regarding the nature of their real-world relationships that these patterns represent.

I’ll be updating this with examples later.