Machine Learning, Professional Sports and Customer Marketing in the UK and Europe

Professional Sports Customer Marketing is driven primarily by two key product lines or revenue areas – subscriptions or memberships, and seat or ticket products. The two are ‘combined’ for the classic or ever-green ‘Season Ticket’ packaged product that is essentially the first tier in a membership programme, and on which other ‘loyalty’ programmes or schemes can function.

This post looks at the application of ‘Machine Learning’ in the form of both supervised and unsupervised methods to Customer Marketing in Professional Sport.

I’ll start with an example of an unsupervised approach, using a ‘standard’ k-means algorithm to identify clusters of Professional Sports Club Customers based on features or attributes that describe as broadly as possible Customer profile or behaviour over time. These features were built or sourced from an underlying ‘data model’ that looks at the following areas broadly

  1. Sales transactions – baskets or product items purchased by a customer over time, broken down by product area in to Season Tickets, Match Tickets, Retail (Merchandise), Memberships and Content subscriptions
  2. Socio-Demographic – relating to individual identity, gender, geography and also to other relationships to other supporters in the data o
  3. Marketing Behaviour – engagement and response to outbound and inbound marketing content over time

We wanted to create a ‘UK Behavioural Model’ that would be representative for UK Sports Clubs, so we created a sample in proportion to overall Club or client size from a set of 25 Clubs in the UK from Football, Rugby and Cricket. The sample consisted of 300k from an overall base of approximately 10 million supporters. The input or feature selection was normalised for all Clubs. We experimented with different iterations based on cluster numbers and sizing.

The exhibit below shows a version with 15 different clusters, numbered and coloured across the first row from 0-14. The row headers are the different features or feature groups. The cluster colours are persisted throughout the rows for each feature. The horizontal ‘size’ of the bar for each cluster in each row is the average per customer for each feature. The width or horizontal size of the bar for each cluster in each row relative to the first row for size is intended to provide a visual guide to differences between clusters.


Revenue £££s in the bottom 3 rows is generated from a handful of clusters only:

  • Purple 8 and Mauve 9 dominate Ticketing revenue
  • Red Cluster 6 and Mauve 9 dominate Memberships revenue
  • Grey Cluster 14 contributes to Merchandise revenue

Pretty much of all of the ‘NonUK’ supporters have been allocated to Light Blue Cluster 1.

Interestingly, Gender (M/F) or Age (Kids, Adults, Seniors) don’t seem to discriminate much between Clusters. See the exhibit below that plots ‘Maleness’ on the Y axis and ‘Age’ on the X axis.


Cluster 4 is ‘Old Men’, Cluster 14 is ‘Ladies of a Certain Age’ but the majority of Clusters (circle diameter proportional to size or number of Customers) aren’t really discriminated by these dimensions or features. We concluded that behaviour of kids ‘followed’ or emulated adults in terms of key features for attendance and membership.

The next section looks at a supervised approach, using a decision-tree classification algorithm to identify ‘propensity’ for members or subscribers to renew or churn, and then the converse for non-members or subscribers to ‘convert’ to become a member, based on similarity to previous retention or acquisition events.

Our work in this area began tentatively in 2010 using an outside consultant (Hello Knut!) from a large software vendor working on a single project membership churn issue in a large Northern London football Club. In the course of the past 5 years, we’ve taken the approach ‘in house’ and ‘democratised’ this in a certain way and applied the techniques over-and-over to different clubs (Hello Emanuela!). We’ve tried to make this as efficient as possible by engineering a common feature set across all clubs and seasons based on a common data model.

For the retention model, we’ve continued to build and train a model for each club AND for each season, as we saw a greater predictive or accuracy over time, also based on including features that encapsulated ‘history’ for each customer up until that season as fully as possible.

For the acquisition model, we have modified the approach slightly using a single input of all acquisition events regardless of season, but still only one club at a time. This was based on the belief or the observation that people became members for roughly the same reasons regardless of season, whilst people ‘churned’ from becoming a member to a non-member based more on season -to -season performance and issues.

Decision trees are often cited as being on the more ‘open’ or ‘transparent’ end of the scale of classification techniques or approaches. However, we’ve succeeded in operationalising the retention model in to Club Sales and Marketing systems and programmes only by using the features or variables that ‘float to the top’ of the decision tree to construct a ‘risk factor’ matrix based on observed ‘real’ behaviour and change in these over time for the customer.

Here’s an example of feature or variable correlation for the STH Acquisition model:


What was particularly interesting here was the importance of the ‘Half Season Ticket’ holding in the previous season and then the ‘groupings’ represented by the other Half Season Ticket holders who lived with the same supporter. This points very clearly to the inter-relationships that we ‘know’ are important between individual supporters, and leads us towards a more Graph-based analytical approach to identify and analyse relationships at play at specific points in the customer life-cycle, life-stage and buying relationship with the Club.

Our industry or sector is still dominated by the ‘Season Ticket’, a ‘hero product’ that continues, like fine wine or an ageing Hollywood A-lister, to defy the years and live on to snaffle a majority of our clients time and attention and share of revenue. The more that we can do to understand the ‘patterns’ behind this, the better.


Stadium Migrations – a ‘Once in a Lifetime’ Journey

Professional sports is, as I’ve written before and will probably write again, a funny old business. The customer is not a customer but a supporter. The Club is a place where the sporting spectacle unfolds, live and un-intermediated, never to be repeated but available, subject to sporting competition format and culture, once a week or so on average if you missed the last one. The bond or affiliation between the supporter and the Club runs deep and is, generally, exclusive and for life, forming an important component of social and individual identity. Supporters are in general a loyal and tribal bunch. They wear the shirt, they sing the song, they share in the journey, and many choose to get married, divorced and have their ashes scattered in the ‘temple’ and centre of activity – the Stadium.

I’m not, as you may have already perceived, a particularly strongly affiliated sports fan or supporter in any private way. Which is lucky, as the company I work in has over 50 individual or separate clients and I feel morally and socially unequipped to deal with the levels, tiers and currents of emotional attachment, reattachment and guilt that such manifest polygamy might entail.

I’ve been privileged, however, to participate in my own way in a number of ‘Stadium Migrations’ in my time at Sports Alliance. ‘Stadium Migration’ refers to the process undergone when a Club chooses, whether by accident or design, to move home from one location to another. This need not be a significant geographical translation – many Stadiums are re-developed ‘in situ’ or adjacent to the original. But whatever the distance, the process is, for most supporters and the Club staff involved, a ‘Once in a Lifetime’ experience.

Clubs choose to ‘migrate’ for a variety of reasons. The old place is looking tacky, falling down, not big enough or not the right balance in terms of levels of product and service they can offer, or it may have been only a ‘temporary’ home due to external circumstances and a short-term sharing arrangement has come to an end. This side of the Atlantic the ‘Club’ at least will be an original organisation with probably over 100 years of history, culture and attachment. They don’t ‘invent’ Clubs, at least as a rule, in Europe as they do in a franchise system. So for the supporters involved it is a ‘migration’ from one place to another, the focus is on the existing supporters seamlessly and not on attracting any new ones. Here at Sports Alliance we do data and marketing, so we’re involved in the ‘Customer Experience’ and very definitely nothing to do with concrete, steel, or grass.

My ‘first’ migration was in many senses the biggest, for Arsenal in the move from Highbury 1 or 2 miles down the road to what is now The Emirates in North London. Here is an example from the 2005 Brochure for allocation of seats to existing supporters.


Since then I’ve worked on or am currently involved in projects for RCD Espanyol, Athletic Bilbao and Barcelona in Spain, Swansea City in Wales, and Saracens Rugby, Tottenham and West Ham in London. Each of these is, of course, different to each other and unique in some way or another. Here I want to talk about the shared or in common elements.


What’s involved? First, we look at Ranking, Relationships and Relocation…

Virtually all migrations we’ve worked on involve creating a ‘Queue’ based on a Ranking system. The Queue needs ordering or ranking according to some agreed criteria – how long you’ve been a member for, how much money you paid for your existing or old seat, whether or not you have an interest or shareholding where applicable, or a more genteel approach so that elders, or women and children, go first.

Relocation is related to the physical location of a seat occupied as part of a season ticket or membership product. This often involves animated discussion and analysis of ‘equivalency’ between two very often very different stadium layouts. These can differ massively according to the number of seats, and more importantly the specific layouts according to stadium design for tiers, blocks and rows, and where exits and other stadium features interrupt or not in both ‘old’ and ‘new’ locations.

Here is an example of a stadium layout that shows ‘missing’ seats from an old to a new stadium – seats that ‘disappear’ when a notion of equivalence has been applied. Blue shows seats that do not have directly equivalent seats, here mainly due to the different layout of the aisles and ‘vomitorios’ or exits.


The criteria decided for individual entitlement or ranking must also work for ‘groups’ of supporters who choose to apply or sit together, either so that new people are encourage to apply and join in or so that existing supporters do not feel that their entitlement is being diluted. Groups may be inferred – based on inference on existing sales or behavioural data for seating or family. Here is an example of a stadium layout representation looking at Age Band. The colour scheme or pallette has been chosen to represent ‘age bands’, where blue is ‘youth’, green is ‘adult’,  red ‘senior, purple the transition between youth and adult. For each age band, a darker shade or hue is intended to show increase in age.


The next example shows an additional dimension of ‘grouping’ based on inferred relationships between adjacent seated customers. The groups are based on grades of proximity or closeness of relationship – from degrees or probability of family relationships to neighbourhood location.


Lurking behind any ‘public’ criteria of ranking will be a Club’s opportunity or desire to ‘up sell’ customers to premium products, if they so desire.

How long does it take? From a few months to a few years… The longest from the list above completed was for Arsenal, taking place over a two year period in the run up to the Emirates opening. The shortest has probably been Athletic Bilbao, which took a turn for the worse or better when the Club shortened the timeline to the coming September when they realised that the physical build was advancing sufficiently. Here is a view of the shiny new San Mames stadium from across the river.

San Mamés 031 p

Once the ‘rules’ for customer or supporter engagement have been agreed, its then time to turn this in to a sales and marketing plan. Because this involves a seat selection in what is often a non-existent venue, this almost always involves a face-to-face visit in a specially constructed or administered sales centre, allowing for the supporter or group of supporters to confirm their choice of seat and for reservations and deposits to be taken at the same time. It’s also important as an ‘experience’ that for many is cherished and downright emotional. At Bilbao, for example, the ‘eldest’ socio (member or season ticket holder) was ceremoniously invited to begin the process. Cue flag waving and tears, and for good reason!

Our role here is also to prepare and furnish a sales and appointment management system that interfaces with existing Club systems for ticket or seat booking and reservation and any marketing or internet applications for communication or grouping assignments. We usually do this in a customised version of an ‘off the shelf’ CRM system such as Microsoft Dynamics, using where possible existing functionality for contacts, groups, service resources and service appointments, and the marketing list and outbound communications that result from this.

We’re currently in the process of planning for 3 new proposed migrations in the UK and Europe, so more will follow.

Marketing at Speed in Professional Sport

We’ve recently, as in the last six months recently, completed our first ever pair of ‘Enterprise Software’ partnerships. Like London buses, you wait ten years for one then suddenly two come along at once .. etc. etc.

One partnership is with Adobe Campaign, a Marketing Automation solution, and another with Tibco for their Business Works  Enterprise Application Service and Messaging product. Both are envisaged over the same sort of term – three years or more, and both are at the heart of a new effort to provide ‘Enterprise Class’ sales and marketing services to our clients. Together these represent a combination of solutions to handle both data integration and end-to-end processes for ‘Marketing at Speed’ in Professional Sport.

Tibco helps connect the dots and provide ‘real time’ data.

Adobe Campaign provides an automation layer for marketing communications that are personalised, timely (thanks, Tibco) and relevant.

Use cases focus on two areas – ‘Bricks’ – for in stadium / on location, and then ‘Clicks’ for ‘the internet’.

For ‘bricks’, we need to know when a supporter is ‘present’, and be able to identify who they are at the same time. In the ‘real world’ this relies on Access control for perimeter access notification and then ‘loyalty’ schemes for EPOS / Till systems linked to a membership card.

For ‘clicks’ we’d like to know when a customer signs in or downloads an app and links this to their membership. In principle an app can also access GPS for physical location, or go further in-store for NFC or iBeacon level of location information.

From them on Tibco helps applications exchange information in real-time, and Adobe furnishes these applications with the right marketing content that is relevant to the individual supporter and the ‘context’ – including the club’s offer or promotion catalogue.

Behind both is the Sports Alliance ‘data model’ that powers both the identification of the supporter and their history with the Club.

Its been ten years in the waiting so its worth explaining the background to this. We’re a funny kind of company working in a funny kind of market.

First, the market. Its fragmented, disjointed and, some of the times, possibly schizophrenic… and underlying this limited in terms of resource and expertise required to tame software and data for ‘enterprise class’ marketing.

To the average observer or ‘supporter’ – the generic term for customer in the industry – a Professional Sports Club looks shiny and big from the outside, with extensive media exposure, a smattering of celebrity and a generous and tangible physical asset with a rectangle of grass sitting in the middle. Commercially, however, they tend to be both extremely fragmented and yet also extremely limited in capabilities in many areas.

The direction and flow of money through the business is public knowledge, the majority inflow coming from higher-level media or sponsorship deals intermediated by leagues or other bodies exclusively designed to market and sell these rights, and flowing down and out directly almost straight through to the players, and their agents, who provide the product.  The infrastructure and organisation remaining inevitably has to focus on what it takes to underpin the ‘main event’ – the matchday – and the other ongoing customer-focussed businesses such as retail, hospitality, or community struggle for attention and resources from what is left over.

The marketing side of the business is primarily all about retention, structured around subscription or membership products (season tickets), and focussed on monetising the big concrete and steel asset with grass in the middle 2 or 3 times a month. In these terms, even the ‘biggest’ brands are local or regional at best in terms of the customer base who will come to the venue and experience the product and pay for it on a regular basis.

Secondly, then, back to us. Sports Alliance has been defined by the clear tension between these realities or limitations on one side and the expectations and ambitions on the other, spread and shared across many individual clients. As a guide to our client network, we have over 50 individual clients. In terms of size, the best proxy is unique customers or supporters under management. Our largest clients will have more than a million, the smallest down to a few tens of thousands. Pooled together we have nearly 15mm customers under management from the UK and mainland Europe. And yet both these ends of the scale, largest and smallest, are in the same business, with the same supporter needs or expectations, and the same ‘customer experience’ to try and deliver.

Sports Alliance as a company has been formed and evolved around these twin realities – part technology, part services, providing an extremely broad set of functions and applications from ‘back end’ systems integration and marketing data warehousing, to ‘front end’ multiple line of business sales and marketing applications or solutions, and then a further layer of services on top for clients that also want them. Within all of this is a focus on a data model that makes sense and can be replicated and evolved within and between clients.

For the ‘back end’ technologies, until now we’ve been firmly in the ‘build your own’ camp, based primarily on Microsoft technologies – SQL Server in the middle, with some C# /.NET and web tech from ASP to MVC and WCF SOAP/XML. ‘Front-end’ is more hybrid, split between ‘lower level’ partnering and some more of the ‘build your own’. Where the market has evolved in terms of SaaS providers we’ve partnered or integrated with the obvious leaders, for CRM this means Dynamics or Salesforce, and for ESP a broader set of suppliers, and for BI Tableau. In any of these, we find ourselves primarily concentrating on data schema and configuration required for replicated a data model and then letting the standard application functionality work around this. In some areas, for example Marketing Campaign Management and Loyalty, we’ve carried on and built ‘Front-end’ completed solutions, mainly down to the fact that 3rd party products in these areas were either too expensive or not fully fit for purpose.

We’ve watched as in recent years occasionally one or two of the larger clubs in our universe have taken a direct sales route to ‘enterprise’ software and solutions, often as part of a sponsorship package, and we believe it would be fair to say that the success rate or return on investment where this has happened would be marginal or arguable at best. The skills and resources required not only to implement but then maintain and evolve these solutions are often underestimated.

Anyway, enough about us or ‘me’. It takes two to tango, or to do a deal, and ‘Enterprise Software’ vendors had, in our niche, remained resolutely ‘Enterprise’ in approach to this. As noted above, where there was the possibility of a ‘big catch’ with a traditional, direct capital expenditure model, the software sales team would carry on in this vein if there was any possibility of a good old fashioned commission on sale. Build it, market it, get one or two ‘reference’ big fishes, and surely they will come.

Sports Alliance, even in pooling together networked resources, has financially been unable or unwilling to go for anything resembling a traditional ‘Enterprise’ solution licence independently. Pooled together as 15mm customers we look even more suspiciously like an ‘Enterprise’ and any traditional licence would in itself be many times the multiple of our total company turnover.

So, what’s changed? Essentially, it looks like we’ve met in the middle.

First, the Enterprise software approach has, at least in our case, shifted towards a more accessible quarterly subscription model, billed in arrears, and one layered to allow for a ‘slow start’ with clear tiers going up as business or usage increases or grows. And that was probably the most significant and necessary component. I spent a number of months with another Enterprise Application vendor who offer a subscription model, but billed only it turned out yearly in advance. Its hard to fund that in our business. Why has his changed? Overall, I believe this is a realisation that the ‘Enterprise’ market model simply doesn’t work outside in the world of SME/SMBs, and some money for software that has very little other cost of sale is simply better than none.

Secondly, Sports Alliance has to recognise that we can’t continue to spread ourselves so thinly and improve on what we do for our clients significantly. We’d continue, as one client said to me recently, to be ‘stuck in third gear’ (a metaphor that will rapidly become meaningless with self-driving cars. I presume he meant a 5 speed car gearbox as well, and not a 10 speed rear cassette on a bicycle).

The use-cases we’re looking at focus on ensuring that the ‘full’ customer profile and history this represents is visible in two massively important areas – when the customer is on site, and at the other end on the internet. Remember, we’re fortunate to work in an industry where the nature of the customer affiliation is such that the more we can show we know about them, the better. Other brands might look spooky or raise concerns of privacy when showing knowledge over a customer relationship that can span decades of sales history, and involve close personal and family relationships across generations.

For on-site, this revolves around the ‘real-time’ physical presence of the customer at the stadium or venue, and ensuring that the customer experience is as delightful as it could possibly be. ‘Real-time’ is difficult to achieve in an industry that is heavily siloed or disparate in systems and applications, and where integration tends to be batch based. We need to identify the location of the customer onsite either by access or entry to the perimeter or at a point of sale via a loyalty scheme or programme identifier.

For the internet, there is also the additional hurdle of identification in an application in a market where many Clubs do not control their own rights or properties and where Ecommerce partners will have separate, standalone applications for sales. Single Sign On, and the opportunity to treat a customer consistently across different applications or touch points, is still very much a work in progress.

So, once identified and we know where they are – in stadium/shop or in an application – its over the marketing department and the campaign or offer catalogue

  1. Personalised in-store loyalty promotion
  2. In application games ‘points boost’ for behaviour
  3. Prompt to redeem loyalty points in newly opened hospitality area exclusively for Season Ticket Holders
  4. Special event local to supporter’s on following weekend for Soccer Schools participants from previous year who have had a birthday party at Club
  5. And so on…

Applying AI – A More Detailed Look at Healthcare Diagnostics

This is intended to be a slightly more detailed look at a single vertical or domain – Healthcare, and within this the single area of diagnosis support, and how ‘new AI’ in different guises is being applied, and by whom, and in what way, and to what end.

Its OK to say that the potential for a ‘marriage’ or liaison between Healthcare and AI is no closeted secret. Healthcare is large, complex, and deeply encased in centuries of knowledge and empirical reasoning. The network relationships between physicians or doctors, institutions, patients, treatments and outcomes are a barely understood global resource of great potential value. The whole is too large for any single human to encompass. Inefficiencies or discrepancies in diagnosis and outcomes are inevitable. The potential for new technologies to help to disrupt and reshape the healthcare market and the diagnosis process is clearly understood. VCs are also clearly interested in the outcome, whether commercially or philanthropically; a good example is Vinod Khosla and the ventures his firm represents, from Lumiata (see below) to and CrowdMed.

Healthcare is of course a universe in itself. Diagnosis, Prescription, Monitoring, Intervention – each area or subsection has its own challenges, contexts and actors involved. The potential for ‘universal’ and non-invasive monitoring or sampling tools and applications is of course enormous in itself. The forecast explosion of consumer data-creating devices and applications is going to create a ‘stream processing’ event orders of magnitude beyond what exists currently. As stated, I’m going to try to concentrate here on Diagnosis, and on software rather than hardware, in the form of ‘expert systems’ to support or guide human decision making.

One place to start is with the ‘who’ rather than the ‘what’ or the ‘how’.

The IBM Watson ‘cognitive computing’ project has a valid claim to be an early starter, and also to be in the forefront of many peoples minds with the heritage of the ‘Deep Blue’ project and on to the 2011 ‘Jeopardy‘ demonstration and subsequent publicity generated. Back in the real world, Watson is now applied as solution as a ‘Discovery Advisor‘ in different domains – including healthcare for clinical trial selection, and pharmaceutical drug development. It’s an approach that is both ambitious and intensive – involving many years of intense R&D and the costs associated, and the partnerships with leading Physicians and Institutions including Cancer and Genomic research for ‘training’ over as many years on top of or outside of this. Outside of healthcare, the ‘Question and Answer’ approach is merged with other IBM product lines for Business Analytics and Knowledge Discovery. The recent acquisition of AlchemyAPI, a younger, nimbler technology and ‘outside focussed’ by its very nature, should integrate well to the Bluemix platform. The example below is from IBM development evangelist Andrew Trice, with a voice UI now for a Healthcare QA application:

Whilst I admire the ambition (whether commercially driven or not) and the underlying scale of Watson, I may question if the ‘blinkenlight‘ aura generated by the humming blue appliance linked then to a ‘solutions and partner ecosystem’ notorious for tripling (or more) of any proposed budget will lead to a true democratisation. I feel the same unnatural commingling of awe and fear in response to the Cray appliance use cases. Amazing, awesome, and yet also extremely expensive. I guess the alternative – commodity hardware run at scale using a suitably clever network engineering process to distribute computation and process results- doesn’t come cheap either.

Also, I understand and concur with the need for ‘real stories’ that publicise and demonstrate an application in a way that the ‘average Joe’ can understand. (My personal favourite is the Google DeepMind Atari simulation – more elsewhere on this.) Some attempts, however well intentioned, simply don’t work, or at least in my opinion. The Watson-as-Chef and Food truck for ‘try me’ events makes me think ‘wow, desperate‘ rather than ‘wow, cool’.

The fast-paced improvement and application of ‘Deep Learning’ Neural Networks in image classification have opened up a new opportunity in Medical Image analysis. Some ‘general purpose Deep Learning as a Service or Appliance’ companies  such as  ErsatzLabs offer their tools as a service, and include Healthcare diagnosis use cases in their portfolio. proposes to offer a more ‘holistic’ approach combining different technologies and approaches – here both Deep Learning for Imaging and intriguingly combine this also with NLP and semantic approaches for healthcare diagnostics.

Lumiata‘s approach appears more graph-driven, ingesting text and structured data from multiple ‘sources’ from insurance claims, health records, and medical literature, creating an analytics framework for assessing or predicting patient ‘risk’, and exposing this as a service for other healthcare apps.

Its also worth mentioning Google, a potential giant of any domain if desire exists, who have already made a move in to health, leveraging their dominance in search and status as ‘first point of call on the internet’ to provide curated health content, including suitably gnostic pronouncements on search algorithm ‘tweaking’ to support this curated health service.

In terms of diagnosis and treatment, the ‘data types’ currently being referenced are essentially images, text or documents, including test results, and relationships. The technical approaches applied map closely to these – Deep Learning Nets for classification of imaging, NLP / XML for semantics, ontology and meaning in unstructured documents and text, and Graph Analytics at scale for the complexity of the ‘web’ of patient-doctor-diagnosis-disease-treatment-outcomes.

Two of the example companies discussed here – IBM Watson and Cray – have a heritage in the high-end (read expensive) appliance or super-computer systems architecture for running highly memory and processing intensive real-time analytics at scale, and the expensive hordes of suited consultants to implement, deploy and manage these solutions over time. The other, newer, smaller ventures show mixed approaches, and, although its early days for any publicly available data, I would assume on a more ‘flexible’ commercial basis.

So what’s the big story? Stepping back and looking down, healthcare and the data it consists of seems to me a big ‘brain’ of information constructed from different formats and substances, but linked together in complex relationships and patterns hidden or obfuscated by barriers of format and location or access. This is traditionally referred to as the ‘real world’, whether its Healthcare or the Enterprise.

The goals or objectives can be simply phrased – improving and optimising patient outcomes, and placing the patient at the centre.

The adversarial paradigm of ‘bad AI’ of rapidly-evolving software systems ‘competing’ against physicians  in a winner-takes-all for the right to diagnose and treat patients is of course naive. And yet the healthcare industry is clearly labelled and targetted up for a ‘disruption’ in the coming decades in terms of who does and is responsible for what. Whatever this ends up looking like, we can be sure it will be radically different to the way it is now.

Its a big, big challenge. No one venture – even at the scale of Google or IBM – is going to do this by themselves. Its going to rely also on a host of smaller ventures, but ones with inversely large ambitions.

Programmatic – Online, TV and Beyond?

I once worked as a media planner in an small digital agency in Central London (queue fairy tale intro…). I’ve been speaking to ex-colleagues who still work in the advertising and media world recently to catch up on all of the interesting things that have changed. Chief among these is programmatic or ‘real time’ advertising, and the flip side of the remaining intransigence or resistance of the TV market to allow itself to be infiltrated by this invidious new practice.

I know first hand the massive inefficiencies and vested interests at play in the ‘traditional’ media planning and buying process, where lunches, personal relationships, buying commission and the surreal practicalities of panel-based viewing as attribution mechanism, would only occasionally be interrupted by excel or the econometrics department when wheeled in with a chart. (This is intentionally facetious and exaggerated, and yet….). At the same time I’ve heard first hand at how suitably robust and acceptable indirect attribution can and is being produced between budget and spots (advertising on TV) and response (primarily digital or online in terms of ecommerce or customer acquisition).

What isn’t there yet is any ability to link the individual – the ‘user’ or ‘customer’ – across channels or media in a suitably seamless manner. This much anyway is familiar to me in my Sports Alliance role.

I’m writing this explicitly not as any market expert, but in the example this represents for the application of machine or software-driven decision making in a large and yet still immature market area or space. Particularly one dominated by the online web giants – Facebook, Yahoo, Google – and the resources they have invested in for their own ‘Artificial Intelligence’ and how this is or will be integrated to their product or application stack.

The RTB exchange system, and its continuing evolution to become a fully-fledged trading floor with supporting mechanics for futures/options, is here to stay, thank god. I’ve seen some of the numbers from friends for the number of transactions or queries per second the market as it currently stands can process or accommodate and its impressive, or at least it is to me anyway.

What I’m less sure about is how ‘intelligent’ the system is in its ability to predict the desired event (click) or allocate or optimise resource (advertising, of different formats, and budget related). I’ll come back to this later.

Deep Learning: Machine Learning becomes ‘Intelligent’, Artificially or Otherwise?

‘Deep Learning’ is the new big thing. ‘Old Fashioned’ Academic AI from the second half of the C20th has passed through the last decades of a ‘narrower’ focus of Data Mining or Knowledge Discovery and Machine Learning to blossom in to what is now touted as a new, vibrant age.

There is too much here to comment on in anything like a sane manner, and this is intended to be a very brief summary of my own ‘non-expert’ position, so I’ll be very brief with a few examples:

  1. The internet and technology giants are splashing cash on ‘Artificial Intelligence’ or ‘Machine Intelligence’, see Yahoo, IBM, Microsoft, Google and Facebook. That much money can’t be wrong, can it?
  2. Where the giants have trod, the VC world has followed. See this single VC post for the ‘Machine Intelligence’ landscape here from the end of 2014. They all can’t be wrong, either, surely?
  3. Goverments have ‘quietly’ been doing their own thing anyway for security, military, and logistical purposes anyway regardless of the public involvement or awareness. See the FBI NGI here, or anything that DARPA funds.

What’s also interesting is the emergence of the ‘old school’ academics among the individuals who are leading this ‘new’ (read ‘old’!) era. This is I believe down to the fact that the skills and knowledge required to be a ‘master’ in this area are intensely academic and rare in themselves, and that the ‘Deep Learning’ technologies that are currently being worked on have a continuing evolution from the early Neural Nets through to the Convolutional or ‘learning’ approaches that represent the ‘state of the art’ today. See first Hinton, LeCun, Ng as three eminent examples who have been ‘appropriated’ or ‘acquired’ or ‘assimilated’ by commercial operations. Ng’s website is the ‘outlier’, the others are gloriously and happily ‘old school’. Demis Hassibis‘s journey to Google is slightly different – not a mainstream academic at all but games developer- but with his skills he could have been. Other leading research figures in the last decade or so include Bengio, Bottou, Ciresan. See any footnotes in published research or NIPS for further lists of individuals.

As a one-time academic of sorts, it is or will be fascinating to see how this ‘cohort’ of researchers react to ‘suddenly’ being thrust in to the spotlight of a broader and more commercial world.

Moving on from this aside, it is worth pointing out that it is a new, vibrant age. The reinforcement loops at play particularly now with the internet / webscale technology giants and their real-world need for ‘intelligent applications’, at speed and at scale, has shifted the landscape and a few paradigms with it.

One  other clear outcome at play in recent years has been the inflation of ‘Data Scientist’ wages, perhaps long overdue, and the related ‘talent search’ or ‘fill a room with Machine Learning PhDs and wait for acquisition’ approach to company formation. Machine Learning or Data Science qualifications and the courses offered to support these by existing academic institutions or online at Coursera are I would imagine highly prized.

I’m going to write separately on particular areas or approaches of interest.

Graph Relationships – Social and Otherwise – in Professional Sport

Good ‘old-fashioned’ Customer Relationships are important in the sector that I work in – Professional Sport. Usually this focusses on the relationship between the brand / venue or club and the customer or supporter (note the flexible terminology here). Retention is core in an industry where the affiliation is often ‘for life’, and the concept of churn is interpreted as gradations or levels or involvement and not as ‘defection’ to a rival brand, so loyalty and the recognition of this by the club in relation to their customers is massively important in reinforcing the ‘tribal’ relationship the supporter feels to the club in question.

Lately, we’ve been doing a lot more analysis of relationships between supporters themselves, more along the lines of good ‘new fashioned’ social or graph relations. This has been driven partly by some of the projects we have been working on and their requirements – particularly large scale ‘once in a lifetime’ stadium migrations – and partly by the availability and access to the technologies that enable this. Examples here include the underlying technology such as SPARQL / RDF, as well as the ability to communicate or visualise this in tools such as D3 or Tableau.

What it doesn’t mean yet is ‘social media’ in terms of facebook, twitter or other mainstream consumer products. We’ve been hampered here by the continuing limitations on many club’s ability to link or identify a social media ID with a real-world ID – digital or otherwise. I’ll comment on this elsewhere in more detail.

In terms of Graph Analytics, we’ve been particularly interested in the patterns that represent groupings or shared activities that help re-reinforce the ‘real world’ relationships that are otherwise hidden to a club. Examples in the real world (bricks not clicks) would include who sits next to whom in a stadium seating plan, who arrives with whom and at what time in terms of stadium or venue access, and what we can infer regarding the nature of their real-world relationships that these patterns represent.

I’ll be updating this with examples later.

Fingerprints in the Archives

I spent a lot of time when researching a PhD on Elizabethan politics and culture, focusing on an individual diplomat and politician called Robert Beale, looking at handwritten letters in various libraries or archives, mostly in the UK, but also in Europe and North America. I completed this PhD in 2000, but never got it published, as I had already started work in an internet advertising agency, and I then got married, started to have children and so tucked it all away in an archive of my own.

I’ve gone back to this material only recently, partly spurred on by a private desire to connect my academic past to my professional present and future, and also connected to in interest in applying Machine Learning or Artificial Intelligence tools to different domains or areas.

The emergence of ‘Deep Learning’ (more here later) in recent years as a technique to aid classification in a broad variety of areas or applications was something I, along with many others now, was familiar with. I was aware enough of the techniques involved to surmise that an application that recognised the identity of a handwriting sample, as well as ‘reading it’, should be something that would be possible.

The presentation here Mark Taviner Handwriting Identification and Applied AI Talk June 2015 is what I have written this month on this. The working title ‘Fingerprints in the Archives’ should be fairly obvious, and I reference the USA FBI NGI fingerprint system, as well as Facebook’s ‘DeepFace’ research in the talk.

Thanks to Professor Cathy Shrank for her support when I suggested this to her, and her video message on why this would be a valuable research tool for historical, cultural and literary research is hosted on Youtube here. I appreciate very much subsequent confirmation from Jeremy Howard at that my ‘guess’ was correct and reference to a dissertation by Luiz Gustavo Hafemann that showed ‘state of the art’ results against a Brazilian Author Handwriting reference database.

There remains much work to be done…


A time coming, but this is a record and notes on things I’m thinking about. Its new because:

1. An uneasy relationship with the invidious first person pronoun singular ‘I’

2. I’ve been working in a company and industry that is relatively ‘internalised’ in our approach to marketing and confidentiality or transparency

For a bit of background, see the ‘About Me‘ page. The rest will be written as a set of blog entries.

I don’t do social media as a private individual. LinkedIn clearly has it utilities in terms of professional contacts and search, but the idea of publicising my private life in the form of text or images on Facebook, Twitter, Tumblr fills me with great dread.