“Data-Driven Healthcare”: Pt II

Data-Driven Healthcare: Introduction

I wrote up some initial thoughts on Healthcare ‘Pt I’ a few weeks ago. This ‘Pt II’ is a follow up looking in more detail at some approaches and algorithms being applied to examine what we might mean by ‘Data Driven Healthcare’.

Its worth re-iterating or restating some of the fundamentals here:

  1. Healthcare is a massive field, and Data related to this is increasing by the second. This must needs be an extremely selective overview
  2. Healthcare on the inside (B2B) – systems, workflows and business processes associated – are not yet ‘Data Driven’, and neither is the outside (B2C) for an overall ‘customer’ or patient experience
  3. A lot of the focus for data-driven healthcare is on optimising outcomes and reducing risk by identifying, diagnosing or intervening earlier and more accurately to provide better outcomes. This also helps acts to reduce costs – financial and organisational, to healthcare providers or insurers, as well as personal to the patient themselves
  4. Commercially, Data-Driven Healthcare is seen as a ‘promised land’ for those interested in disruption, and a Breughel-esque landscape of terror for those who are not. For the USA alone, trillions of $$$ are at stake. See McKinsey on market size here and on disruption here, and PWC on data-driven decision making in health here. In the USA the specific changes in the structure of the healthcare insurance market represented by the ACA have added a further ‘twist’. Here is a sample exhibit from one of the PWC papers on data and analytics in executive decision making:

PWCHealthcareDataDriven2014 v3

One thing that most observers agree on is that patience, tenacity, and targeting are going to be required. There cannot be a ‘killer app’ or a ‘single behemoth’ that floats to the top. Here’s a representative quote from Venture Beat in May 2015:

“There may be no killer apps in digital health. There may be no Uber for health care. The good investments in the space are startups that are willing to go neck-deep in the complexities of the business, the regulations, the clinical aspects, the old school workflows, and other minutia of the health care business. They fill a niche, erase a friction point, and then hang in for the long haul. Jack Young, a partner at dRx Capital, nails it with this quote: “It’s not about disruption, it’s about implementation.”

So, breaking this down in to a narrative structure is also a ‘work in progress’. Some of the key themes include:

Instrumentation – or the ‘generation’ of data. This is split between ‘business’ or B2B users – clinical or hospital based data instrumentation and ‘consumer’ users – ranging from web apps, sensors, wearables to user-generated content on the web.

Relationships – this is the ‘big Graph’ that healthcare represents – relationships, over time, between any of the actors or participants in the process. This also includes some notes on organisational ‘culture’, including policy and legislation landscapes for the USA and UK. For USA see the FDA website on Science and Research guidelines here and on compliance.

Applications – not strictly in a software sense, but more towards clear ‘requirement- or needs-driven’ cases that are leading the way. This ranges from Genomic sampling to Diagnosis.

Players – again, its often easiest to try and make sense of what is going on by looking at who is doing what, from the ‘giants’ such as Google, Microsoft or IBM to the younger, more narrowly-targeted start-ups around the Globe, and including some of the most influential individuals involved as well for the ‘human’ side.

Instrumentation from the Inside

Much of the focus currently is on future data from patients themselves that is yet to be instrumented, primarily through ‘wearables’, either as devices or applications, in some form or another. Before we get there, though, there is the question of the data that is already ‘instrumented’ in many senses, and is waiting for analysis. The trouble is, its hidden away, in silos, in systems, in locker-rooms and across multiple locations. IBM Watson Health have a few years of foothold here:


Wearables: Instrumentation from the Outside

Wearables are now mainstream. These vary from the ‘multi-functional’ or decorative that provide a home for selected health-related applications or data such as the Apple Watch, to ‘dedicated’ health specific devices…


The data streams that will be instrumented from this and similar platforms make me think also of the Hugh Laurie ‘House’ character’s predisposition to mistrust anything a patient says – the ‘Patients are liars’ approach to medical history and diagnosis.

Already we’ve seen partnerships evolve between parties, both large and small, aligning on who can provide access to the data stream (the wearable / application providers) and the data consumers. Note also that in the USA the FTC has already laid down the law in 2015 on apps claiming to diagnose melanoma based on smartphone pictures.

User Generated Content

One area for Public Health set apart uses ‘trends’ inferred from unstructured ‘user-generated content’ in the form of search activity or social media ‘Early warning’ detection and / or impact or prevalence estimation for epidemics or communicable or infections disease.

The search giants – Google and Microsoft – have led the way here for obvious reasons. Examples include Google Flu Trends in production since 2008 (see the ‘comic book’ on how this then became Google Correlate and then Google Trends here), recently expanded also to include Dengue Trends, and then also Microsoft Research studies on Ebola and on the effectiveness of a health intervention campaign on Influenza (the ‘flu’) in the UK here.


The Google Flu Trends work has been ongoing for a number of years, and, like many ‘early entrants’, suffered some unwanted and perhaps unfair adverse publicity due to significant over-reporting of ICI in 2013.  Using the ‘hype-cycle’ as a guide, this had firmly slipped down to the trough of disillusionment. Overall, ‘real’ organisations have responded favourably to try and help improve the approach, including to incorporate more time-series or ‘real-time’ data from public reporting systems, and GFT underwent a major ‘upgrade‘ in 2014 to include the ‘real’ data published by the CDC for the USA.

Microsoft have also branched out to another form of ‘Early warning’ with their ‘Project Premonition’, combining a Mosquito-trapping, drone-based ‘AWACs’ system for capturing and genetically sampling mosquito populations, identifying pathogenic or malarial mosquitos and modelling the data in the cloud here.

(An interesting ‘on the ground’ comparison is Metabiota, offering ‘data-driven’ epidemiological and disease detection services from the bottom up. The Data Collective VC website has a set of other, interesting ‘Health’ or ‘BioEngineering’ startups as well).

Drug and Genome Research

Where to start? I’m not going to rehash or repeat what would be better studied elsewhere – the USA National Human Genome Research Institute or the UK Sanger Institute at Cambridge.

One ‘novel’ approach to the instrumentation of large data sets for study has been the Icelandic Genome Project, now DeCode, aquired by Amgen in 2013, and officially concentrating on the identification of key genetic risk factors for disease.

Large-scale drug discovery spanning pharmaceuticals and bio-engineering is a universe in itself. The time-to-market and R&D costs associated with Drug Discovery are so vast that the companies involved are by definition huge, but in that they support a large ecosystem of smaller operations.

The ‘Medical Graph’ and Entity Relationships

The ‘Medical Graph’ is broad, deep and complex. There are many different graphs, and graphs-within-graphs….

  1. Inter-relatedness of medical entities such as treatments, devices, drugs, procedures. See this paper Building the graph of medicine from millions of clinical narratives” from Nature in 2014 analysing co-occurrence in 20 million EHR records for 19 years from a California hospital. Here is a nice graphic on the workflow involved:Nature2014MedicalGraph
  2. ‘Big Data’ and ‘Graph Analytics’ on a Medical dataset running on dedicated appliances. For one example, see this presentation from 2012 on converting Mayo Clinic data from SQL to SPARQL/RDF for querying
  3. Three years is a long time. The software and math as well as the ‘High Performance Computing’ required to scale for this has advanced massively. There is the hugely impressive work of Jeremy Kepner of MIT on D4M, for example here on Bio-Sequencing cross-correlation, as well as the platform providers from Cray to MemSQL…

Players, People, Partnerships

This is very much a work in progress. I’m going to do this as simple list for now, and attempt to put this together a version of a market overview picture at the end.

Major systems vendors healthcare divisions and large Pharma / Drug Co’s:

GE Healthcare

Siemens Healthcare

Philips Healthcare

Novartis for modelling and simulation in Drug Development

Roche research

Technology, Applications:

IBM Watson Health. See also relationship to Apple ‘application’ data streams reported in April 2015. IBM is a massive operation, and contains possibly ‘conflicting’ solutions – see this post from 2015 here that is basically selling hardware again.

SAS as an enterprise technology vendor with a ‘health’ vertical. See articles on sponsored AllAnalytics.com and examples on corporate site here.

Crowdmed launched in 2013 as a platform for ‘crowd-sourcing’ disease diagnosis.

Counsyl and 23andme for self-administered DNA sampling and screening, see also the FDA concern expressed reported in Forbes 2013 here and here.

Google by their very nature are a major player, operating across a wide range of research and application areas or domains. Providing additional algorithm design and research see ML for Drug Discovery in 2015, and also the infrastructure to support an ‘open’ basis for research by others – see their  Cloud Platform for Genomics Research in 2014.

Anything to do with Google X and ‘futurist’ approaches is going to make headlines. See the partnership betwen Novartis and Google X for Glucose-measuring contact lenses publicised in 2014, and the Google wristband as a data-instrumenting wearable for health data. Google Ventures is also active in the field, see their involvement in Calicolabs research in to ageing.

Apple’s ResearchKit for medical and health data applications

Brendan Frey and his team at the University of Toronto (Hinton’s legacy again) for building an algorithm to analyse DNA inputs and gene splicing outputs.

Vijay Pande and the Pande Labs at Stanford for Markov State Modelling for biophysics and biophysical chemistry.

Sage BioNetworks and Dr Stephen Friend for research in linkages between genetics and drug discovery.

Medidata as a ‘clinical cloud’ for data-driven research trials.

Microsoft Research and Health, including Dr Eric Horvitz, quoted here in Wired in 2014: “Electronic health records [are] like large quarries where there’s lots of gold, and we’re just beginning to mine them”. Here is a TV appearance on Data and Medical Research from 2013.

Its always nice to appear to be ‘cutting edge’. The Microsoft Research Faculty summit 2015 held yesterday and today (9th July 2015) includes a number of sessions that, if not directly mentioning healthcare, would be fascinating in terms of their general applicability – ‘Integrative AI’, ‘Semantics and Knowledge Bases’, ‘Programming Models for Probabilistic Reasoning’ and more. Would be lovely to be there, catch the stream later.

NCI and ABCC (Advanced Biomedical Computing Centre) under Jack Collins in Maryland.

The HPC world of pMATLAB and D4M of Jeremy Kepner and John Gilbert.

LifeImage for medical image sharing.

Phew. I’m going to conclude now with a brief note on some of the partnerships or alliances, whether strategic or acquisition-related, already emerging.

The partnership or alliances that stick out for me right now are those that help span the B2B and B2C divide – where a business or enterprise focused outfit with existing traction and experience in the healthcare market partners with a consumer focused outfit which has reach, technology and infrastructure… kind of like ‘Business’ server – ‘Consumer’ client architecture.

Due to reporting visibility and transparency these are kind of ‘obvious’ examples:

  • IBM Watson Health – Apple for app data sharing and processing. There seems to be a synergy here, but I’m not yet clear on exactly how this will be delivered?
  • Novartis – Google X for the Contact Lens Glucose / Diabetes monitoring. This is a more ‘research’ focussed partnership – where the drug and manufacturing skills of one are complemented by the data and consumer knowledge of the other.
  • Broad Institute – Google Genomics, more hybrid in that its a marriage of Google computational power and analytics infrastructure with deeper genomic analysis tools from Massachusetts.

Underneath the radar are more industry-specific partnerships between organisations with an interest in ‘benevolent disruption’ in terms of improving efficiencies and outcomes – insurers and healthcare providers – and application or solution providers that can help them do this, bit by bit if necessary.


Applying AI – A More Detailed Look at Healthcare Diagnostics

This is intended to be a slightly more detailed look at a single vertical or domain – Healthcare, and within this the single area of diagnosis support, and how ‘new AI’ in different guises is being applied, and by whom, and in what way, and to what end.

Its OK to say that the potential for a ‘marriage’ or liaison between Healthcare and AI is no closeted secret. Healthcare is large, complex, and deeply encased in centuries of knowledge and empirical reasoning. The network relationships between physicians or doctors, institutions, patients, treatments and outcomes are a barely understood global resource of great potential value. The whole is too large for any single human to encompass. Inefficiencies or discrepancies in diagnosis and outcomes are inevitable. The potential for new technologies to help to disrupt and reshape the healthcare market and the diagnosis process is clearly understood. VCs are also clearly interested in the outcome, whether commercially or philanthropically; a good example is Vinod Khosla and the ventures his firm represents, from Lumiata (see below) to Ginger.io and CrowdMed.

Healthcare is of course a universe in itself. Diagnosis, Prescription, Monitoring, Intervention – each area or subsection has its own challenges, contexts and actors involved. The potential for ‘universal’ and non-invasive monitoring or sampling tools and applications is of course enormous in itself. The forecast explosion of consumer data-creating devices and applications is going to create a ‘stream processing’ event orders of magnitude beyond what exists currently. As stated, I’m going to try to concentrate here on Diagnosis, and on software rather than hardware, in the form of ‘expert systems’ to support or guide human decision making.

One place to start is with the ‘who’ rather than the ‘what’ or the ‘how’.

The IBM Watson ‘cognitive computing’ project has a valid claim to be an early starter, and also to be in the forefront of many peoples minds with the heritage of the ‘Deep Blue’ project and on to the 2011 ‘Jeopardy‘ demonstration and subsequent publicity generated. Back in the real world, Watson is now applied as solution as a ‘Discovery Advisor‘ in different domains – including healthcare for clinical trial selection, and pharmaceutical drug development. It’s an approach that is both ambitious and intensive – involving many years of intense R&D and the costs associated, and the partnerships with leading Physicians and Institutions including Cancer and Genomic research for ‘training’ over as many years on top of or outside of this. Outside of healthcare, the ‘Question and Answer’ approach is merged with other IBM product lines for Business Analytics and Knowledge Discovery. The recent acquisition of AlchemyAPI, a younger, nimbler technology and ‘outside focussed’ by its very nature, should integrate well to the Bluemix platform. The example below is from IBM development evangelist Andrew Trice, with a voice UI now for a Healthcare QA application:

Whilst I admire the ambition (whether commercially driven or not) and the underlying scale of Watson, I may question if the ‘blinkenlight‘ aura generated by the humming blue appliance linked then to a ‘solutions and partner ecosystem’ notorious for tripling (or more) of any proposed budget will lead to a true democratisation. I feel the same unnatural commingling of awe and fear in response to the Cray appliance use cases. Amazing, awesome, and yet also extremely expensive. I guess the alternative – commodity hardware run at scale using a suitably clever network engineering process to distribute computation and process results- doesn’t come cheap either.

Also, I understand and concur with the need for ‘real stories’ that publicise and demonstrate an application in a way that the ‘average Joe’ can understand. (My personal favourite is the Google DeepMind Atari simulation – more elsewhere on this.) Some attempts, however well intentioned, simply don’t work, or at least in my opinion. The Watson-as-Chef and Food truck for ‘try me’ events makes me think ‘wow, desperate‘ rather than ‘wow, cool’.

The fast-paced improvement and application of ‘Deep Learning’ Neural Networks in image classification have opened up a new opportunity in Medical Image analysis. Some ‘general purpose Deep Learning as a Service or Appliance’ companies  such as  ErsatzLabs offer their tools as a service, and include Healthcare diagnosis use cases in their portfolio.

Enlitic.com proposes to offer a more ‘holistic’ approach combining different technologies and approaches – here both Deep Learning for Imaging and intriguingly combine this also with NLP and semantic approaches for healthcare diagnostics.

Lumiata‘s approach appears more graph-driven, ingesting text and structured data from multiple ‘sources’ from insurance claims, health records, and medical literature, creating an analytics framework for assessing or predicting patient ‘risk’, and exposing this as a service for other healthcare apps.

Its also worth mentioning Google, a potential giant of any domain if desire exists, who have already made a move in to health, leveraging their dominance in search and status as ‘first point of call on the internet’ to provide curated health content, including suitably gnostic pronouncements on search algorithm ‘tweaking’ to support this curated health service.

In terms of diagnosis and treatment, the ‘data types’ currently being referenced are essentially images, text or documents, including test results, and relationships. The technical approaches applied map closely to these – Deep Learning Nets for classification of imaging, NLP / XML for semantics, ontology and meaning in unstructured documents and text, and Graph Analytics at scale for the complexity of the ‘web’ of patient-doctor-diagnosis-disease-treatment-outcomes.

Two of the example companies discussed here – IBM Watson and Cray – have a heritage in the high-end (read expensive) appliance or super-computer systems architecture for running highly memory and processing intensive real-time analytics at scale, and the expensive hordes of suited consultants to implement, deploy and manage these solutions over time. The other, newer, smaller ventures show mixed approaches, and, although its early days for any publicly available data, I would assume on a more ‘flexible’ commercial basis.

So what’s the big story? Stepping back and looking down, healthcare and the data it consists of seems to me a big ‘brain’ of information constructed from different formats and substances, but linked together in complex relationships and patterns hidden or obfuscated by barriers of format and location or access. This is traditionally referred to as the ‘real world’, whether its Healthcare or the Enterprise.

The goals or objectives can be simply phrased – improving and optimising patient outcomes, and placing the patient at the centre.

The adversarial paradigm of ‘bad AI’ of rapidly-evolving software systems ‘competing’ against physicians  in a winner-takes-all for the right to diagnose and treat patients is of course naive. And yet the healthcare industry is clearly labelled and targetted up for a ‘disruption’ in the coming decades in terms of who does and is responsible for what. Whatever this ends up looking like, we can be sure it will be radically different to the way it is now.

Its a big, big challenge. No one venture – even at the scale of Google or IBM – is going to do this by themselves. Its going to rely also on a host of smaller ventures, but ones with inversely large ambitions.