“Data-Driven Healthcare”: Pt II

Data-Driven Healthcare: Introduction

I wrote up some initial thoughts on Healthcare ‘Pt I’ a few weeks ago. This ‘Pt II’ is a follow up looking in more detail at some approaches and algorithms being applied to examine what we might mean by ‘Data Driven Healthcare’.

Its worth re-iterating or restating some of the fundamentals here:

  1. Healthcare is a massive field, and Data related to this is increasing by the second. This must needs be an extremely selective overview
  2. Healthcare on the inside (B2B) – systems, workflows and business processes associated – are not yet ‘Data Driven’, and neither is the outside (B2C) for an overall ‘customer’ or patient experience
  3. A lot of the focus for data-driven healthcare is on optimising outcomes and reducing risk by identifying, diagnosing or intervening earlier and more accurately to provide better outcomes. This also helps acts to reduce costs – financial and organisational, to healthcare providers or insurers, as well as personal to the patient themselves
  4. Commercially, Data-Driven Healthcare is seen as a ‘promised land’ for those interested in disruption, and a Breughel-esque landscape of terror for those who are not. For the USA alone, trillions of $$$ are at stake. See McKinsey on market size here and on disruption here, and PWC on data-driven decision making in health here. In the USA the specific changes in the structure of the healthcare insurance market represented by the ACA have added a further ‘twist’. Here is a sample exhibit from one of the PWC papers on data and analytics in executive decision making:

PWCHealthcareDataDriven2014 v3

One thing that most observers agree on is that patience, tenacity, and targeting are going to be required. There cannot be a ‘killer app’ or a ‘single behemoth’ that floats to the top. Here’s a representative quote from Venture Beat in May 2015:

“There may be no killer apps in digital health. There may be no Uber for health care. The good investments in the space are startups that are willing to go neck-deep in the complexities of the business, the regulations, the clinical aspects, the old school workflows, and other minutia of the health care business. They fill a niche, erase a friction point, and then hang in for the long haul. Jack Young, a partner at dRx Capital, nails it with this quote: “It’s not about disruption, it’s about implementation.”

So, breaking this down in to a narrative structure is also a ‘work in progress’. Some of the key themes include:

Instrumentation – or the ‘generation’ of data. This is split between ‘business’ or B2B users – clinical or hospital based data instrumentation and ‘consumer’ users – ranging from web apps, sensors, wearables to user-generated content on the web.

Relationships – this is the ‘big Graph’ that healthcare represents – relationships, over time, between any of the actors or participants in the process. This also includes some notes on organisational ‘culture’, including policy and legislation landscapes for the USA and UK. For USA see the FDA website on Science and Research guidelines here and on compliance.

Applications – not strictly in a software sense, but more towards clear ‘requirement- or needs-driven’ cases that are leading the way. This ranges from Genomic sampling to Diagnosis.

Players – again, its often easiest to try and make sense of what is going on by looking at who is doing what, from the ‘giants’ such as Google, Microsoft or IBM to the younger, more narrowly-targeted start-ups around the Globe, and including some of the most influential individuals involved as well for the ‘human’ side.

Instrumentation from the Inside

Much of the focus currently is on future data from patients themselves that is yet to be instrumented, primarily through ‘wearables’, either as devices or applications, in some form or another. Before we get there, though, there is the question of the data that is already ‘instrumented’ in many senses, and is waiting for analysis. The trouble is, its hidden away, in silos, in systems, in locker-rooms and across multiple locations. IBM Watson Health have a few years of foothold here:


Wearables: Instrumentation from the Outside

Wearables are now mainstream. These vary from the ‘multi-functional’ or decorative that provide a home for selected health-related applications or data such as the Apple Watch, to ‘dedicated’ health specific devices…


The data streams that will be instrumented from this and similar platforms make me think also of the Hugh Laurie ‘House’ character’s predisposition to mistrust anything a patient says – the ‘Patients are liars’ approach to medical history and diagnosis.

Already we’ve seen partnerships evolve between parties, both large and small, aligning on who can provide access to the data stream (the wearable / application providers) and the data consumers. Note also that in the USA the FTC has already laid down the law in 2015 on apps claiming to diagnose melanoma based on smartphone pictures.

User Generated Content

One area for Public Health set apart uses ‘trends’ inferred from unstructured ‘user-generated content’ in the form of search activity or social media ‘Early warning’ detection and / or impact or prevalence estimation for epidemics or communicable or infections disease.

The search giants – Google and Microsoft – have led the way here for obvious reasons. Examples include Google Flu Trends in production since 2008 (see the ‘comic book’ on how this then became Google Correlate and then Google Trends here), recently expanded also to include Dengue Trends, and then also Microsoft Research studies on Ebola and on the effectiveness of a health intervention campaign on Influenza (the ‘flu’) in the UK here.


The Google Flu Trends work has been ongoing for a number of years, and, like many ‘early entrants’, suffered some unwanted and perhaps unfair adverse publicity due to significant over-reporting of ICI in 2013.  Using the ‘hype-cycle’ as a guide, this had firmly slipped down to the trough of disillusionment. Overall, ‘real’ organisations have responded favourably to try and help improve the approach, including to incorporate more time-series or ‘real-time’ data from public reporting systems, and GFT underwent a major ‘upgrade‘ in 2014 to include the ‘real’ data published by the CDC for the USA.

Microsoft have also branched out to another form of ‘Early warning’ with their ‘Project Premonition’, combining a Mosquito-trapping, drone-based ‘AWACs’ system for capturing and genetically sampling mosquito populations, identifying pathogenic or malarial mosquitos and modelling the data in the cloud here.

(An interesting ‘on the ground’ comparison is Metabiota, offering ‘data-driven’ epidemiological and disease detection services from the bottom up. The Data Collective VC website has a set of other, interesting ‘Health’ or ‘BioEngineering’ startups as well).

Drug and Genome Research

Where to start? I’m not going to rehash or repeat what would be better studied elsewhere – the USA National Human Genome Research Institute or the UK Sanger Institute at Cambridge.

One ‘novel’ approach to the instrumentation of large data sets for study has been the Icelandic Genome Project, now DeCode, aquired by Amgen in 2013, and officially concentrating on the identification of key genetic risk factors for disease.

Large-scale drug discovery spanning pharmaceuticals and bio-engineering is a universe in itself. The time-to-market and R&D costs associated with Drug Discovery are so vast that the companies involved are by definition huge, but in that they support a large ecosystem of smaller operations.

The ‘Medical Graph’ and Entity Relationships

The ‘Medical Graph’ is broad, deep and complex. There are many different graphs, and graphs-within-graphs….

  1. Inter-relatedness of medical entities such as treatments, devices, drugs, procedures. See this paper Building the graph of medicine from millions of clinical narratives” from Nature in 2014 analysing co-occurrence in 20 million EHR records for 19 years from a California hospital. Here is a nice graphic on the workflow involved:Nature2014MedicalGraph
  2. ‘Big Data’ and ‘Graph Analytics’ on a Medical dataset running on dedicated appliances. For one example, see this presentation from 2012 on converting Mayo Clinic data from SQL to SPARQL/RDF for querying
  3. Three years is a long time. The software and math as well as the ‘High Performance Computing’ required to scale for this has advanced massively. There is the hugely impressive work of Jeremy Kepner of MIT on D4M, for example here on Bio-Sequencing cross-correlation, as well as the platform providers from Cray to MemSQL…

Players, People, Partnerships

This is very much a work in progress. I’m going to do this as simple list for now, and attempt to put this together a version of a market overview picture at the end.

Major systems vendors healthcare divisions and large Pharma / Drug Co’s:

GE Healthcare

Siemens Healthcare

Philips Healthcare

Novartis for modelling and simulation in Drug Development

Roche research

Technology, Applications:

IBM Watson Health. See also relationship to Apple ‘application’ data streams reported in April 2015. IBM is a massive operation, and contains possibly ‘conflicting’ solutions – see this post from 2015 here that is basically selling hardware again.

SAS as an enterprise technology vendor with a ‘health’ vertical. See articles on sponsored AllAnalytics.com and examples on corporate site here.

Crowdmed launched in 2013 as a platform for ‘crowd-sourcing’ disease diagnosis.

Counsyl and 23andme for self-administered DNA sampling and screening, see also the FDA concern expressed reported in Forbes 2013 here and here.

Google by their very nature are a major player, operating across a wide range of research and application areas or domains. Providing additional algorithm design and research see ML for Drug Discovery in 2015, and also the infrastructure to support an ‘open’ basis for research by others – see their  Cloud Platform for Genomics Research in 2014.

Anything to do with Google X and ‘futurist’ approaches is going to make headlines. See the partnership betwen Novartis and Google X for Glucose-measuring contact lenses publicised in 2014, and the Google wristband as a data-instrumenting wearable for health data. Google Ventures is also active in the field, see their involvement in Calicolabs research in to ageing.

Apple’s ResearchKit for medical and health data applications

Brendan Frey and his team at the University of Toronto (Hinton’s legacy again) for building an algorithm to analyse DNA inputs and gene splicing outputs.

Vijay Pande and the Pande Labs at Stanford for Markov State Modelling for biophysics and biophysical chemistry.

Sage BioNetworks and Dr Stephen Friend for research in linkages between genetics and drug discovery.

Medidata as a ‘clinical cloud’ for data-driven research trials.

Microsoft Research and Health, including Dr Eric Horvitz, quoted here in Wired in 2014: “Electronic health records [are] like large quarries where there’s lots of gold, and we’re just beginning to mine them”. Here is a TV appearance on Data and Medical Research from 2013.

Its always nice to appear to be ‘cutting edge’. The Microsoft Research Faculty summit 2015 held yesterday and today (9th July 2015) includes a number of sessions that, if not directly mentioning healthcare, would be fascinating in terms of their general applicability – ‘Integrative AI’, ‘Semantics and Knowledge Bases’, ‘Programming Models for Probabilistic Reasoning’ and more. Would be lovely to be there, catch the stream later.

NCI and ABCC (Advanced Biomedical Computing Centre) under Jack Collins in Maryland.

The HPC world of pMATLAB and D4M of Jeremy Kepner and John Gilbert.

LifeImage for medical image sharing.

Phew. I’m going to conclude now with a brief note on some of the partnerships or alliances, whether strategic or acquisition-related, already emerging.

The partnership or alliances that stick out for me right now are those that help span the B2B and B2C divide – where a business or enterprise focused outfit with existing traction and experience in the healthcare market partners with a consumer focused outfit which has reach, technology and infrastructure… kind of like ‘Business’ server – ‘Consumer’ client architecture.

Due to reporting visibility and transparency these are kind of ‘obvious’ examples:

  • IBM Watson Health – Apple for app data sharing and processing. There seems to be a synergy here, but I’m not yet clear on exactly how this will be delivered?
  • Novartis – Google X for the Contact Lens Glucose / Diabetes monitoring. This is a more ‘research’ focussed partnership – where the drug and manufacturing skills of one are complemented by the data and consumer knowledge of the other.
  • Broad Institute – Google Genomics, more hybrid in that its a marriage of Google computational power and analytics infrastructure with deeper genomic analysis tools from Massachusetts.

Underneath the radar are more industry-specific partnerships between organisations with an interest in ‘benevolent disruption’ in terms of improving efficiencies and outcomes – insurers and healthcare providers – and application or solution providers that can help them do this, bit by bit if necessary.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s