Machine Intelligence at Speed: Some Technical or Platform Notes


This post looks at some of the underlying technologies, tools, platforms and architectures that are now enabling ‘Machine Intelligence at Speed’. Speed as a concept is closely related to both Scale and Scalability. For my convenience and to try and organise things, by this I mean applications that

  1. Are built on or involve ‘Big Data’ architecture, tools and technologies
  2. Utilise a stream or event processing design pattern for real-time ‘complex’ event processing
  3. Involve an ‘In-Memory Computing’ component to be quick and also to help scale predictably at speed
  4. Also support or embed ‘Machine Learning’ or ‘Machine Intelligence’ to help detect or infer patterns in ‘real time’

People in the Bay Area reading the above might well read the above and shout ‘AMPLabs! Spark!’, which is pretty much where I’ll finish!

Hype Cycle and the ‘New Big Thing(s)’

Here is the familiar Gartner Tech Hype Cycle curve for 2014. In it you can see ‘Big Data’, ‘Complex Event Processing’ and ‘In-Memory DBMS’ chugging their sad way down the ‘Trough of Disillusionment’, whilst ‘NLP’ is still merrily peaking. ‘Deep Learning’ in terms of Deep Neural Nets doesn’t seem to my eye to have made it in time for last year.


Its a minor and unjustifiable quibble at Gartner who have to cover an awful lot of ground in one place, but the semantic equivalence of many of the ‘tech’ terms here is questionable, and the shape and inflexion points of the curves in the cycle as well as the time to reach plateau may differ.

What is important that this demonstrates is that the cyclicity this represents is well founded in new company and new technology ‘journeys’, and often how these companies are funded, traded and acquired by VCs and by each other. What I’m also interested in here is how a number of these ‘separate’ technology entities or areas combine and are relevant to Machine Intelligence or Learning at Speed.

Big Data Architectural Models

(Important proviso – I am not another ‘self professed next ****ing Google architect’, or even a ‘real’ technologist. See the ‘in/famous’ YouTube scat ‘MongoDB is webscale‘ from Garret Smith in 2010, approx 3 mins in for a warning on this. I almost fell of my chair laughing etc etc. I work in a company where we do a lot of SQL and not much else. I also don’t code. But I’m entitled to my opinion, and I’ll try to back it up!)

Proviso aside, I quite enjoy ‘architecture’, as an observer mainly, trying to see how and why different design approaches evolve, which ones work better than others, and how everything in the pot works with everything else.

Here are two brief examples – MapR’s ‘Zeta‘ architecture and Nathan Marz’s ‘Lambda‘ architecture. I’ll start with Marz as its deceptively ‘simple’ in its approach with 3 layers – speed, batch and serving. Marz worked on the initial BackType / Twitter engine and ‘wrote the book’ for Manning so I’m inclined to treat him as an ‘expert’.


Marz’s book goes in to much more detail obviously, but the simplicity that the diagram above pervades his approach. MapR’s ‘Zeta’ architecture applied to Google is here:MapRZetaGoogleExample

I know next to nothing about what actually Google does on the inside, but I’ll trust that Jim Scott from MapR does or he wouldn’t put this out to public, would he?

What this is telling me is that the ‘redesign’ of Enterprise Architecture by the web giants and what is now the ‘Big Data’ ecosystem is here to stay, and is being ‘democratised’ via the IaaS / PaaS providers, including Google themselves, via Cloud access available anyone, at a price per instance or unit per second, hour, day or month.

There are then the ‘new’ companies like MapR that will deliver this new architecture to the Enterprise who may not want to go to the Cloud for legal or strategic reasons. Set against this are the ‘traditional’ technology Enterprise vendors – Oracle, IBM, SAS, which I’ll return to elsewhere for reasons of brevity as well as knowledge on my behalf.

Big Data has evolved rapidly from something that 5 yrs ago was the exclusive preserve of Web Giants to a set of tools that any company or enterprise can utilise now. Rather than a BYO, ‘Big Data’ tool-kits and solutions are available on a service or rental model from a variety of vendors in the Infrastructure or Platform as-a-Service space, from ‘specialists’ such as Hortonworks, MapR or Cloudera, to the ‘generic’ IaaS cloud platforms such as AWS, Azure or Google.

As well as this democratisation, one of the chief change in character has also been from ‘batch’ to ‘non-batch’ in terms of architecture, latency and the applications this can then solve or support. ‘Big Data’ must also be ‘Fast Data’ now, which lead straight in to Stream or Event processing frameworks.

Stream Processing

Other developments focus on making this faster, primarily on Spark and related stream or event processing. Even as a non-developer, I particularly like the books series, for instance Nathan Marz’s ‘Big Data‘, Andrew Psaltis’s ‘Streaming Data‘, and Marko Bonaci’s ‘Spark in Action‘ books, and also appreciate talking with Rene Houkstra at Tibco regarding their own StreamBase CEP product. .

In technical terms this is well illustrated in the evolution from a batch data store and analytics process based on Hadoop HDFS / MapReduce / Hive towards stream or event or stream processing based on more ‘molecular’ and ‘real-time’ architectures using frameworks and tools such as Spark / Storm / Kafka / MemSQL / Redis and so on. The Web PaaS giants have developed their own ‘flavours’ as part of their own bigger Cloud services based on internal tools or products, for example Amazon Kinesis and Google Cloud Dataflow.

As in many ‘big things’ there is an important evolution to bear in mind and how different vendors and tools fit in to this. For example, at Sports Alliance we’ve just partnered with Tibco for their ‘entry’ SOA / ESB product BusinessWorks. I’ve discussed the Event Processing product with Tibco but only for later reference or future layering on top. This product does has a evolution inside Tibco of over a decade – ‘Event’ or ‘Stream’ processing was not necessarily invented in 2010 by Yahoo! or Google, and the enterprise software giants have been working in this area for a decade or more, driven primarily by industrial operations and financial services. Tibco use a set of terms including ‘Complex Event Processing’ or ‘Business Optimization’, which work on the basis of an underlying event stream sourced from disparate SOA systems via the ESB, an In-Memory ‘Rules Engine’, where state-machine or the ‘whatif’ rules for pattern recognition are or may be Analyst-defined (an important exception to the ‘Machine Learning’ paradigm below) and applied within the ‘Event Cloud’ via a correlation or relationship engine.

The example below is for an ‘Airline Disruption Management’ system, applying Analyst-defined rules over a 20,000 events per second ‘cloud’ populated by the underlying SOA systems. Whether its a human-identified pattern or not, I’m still reassured that the Enterprise Software market can do this sort of thing in real-time, in the ‘real world’.


The enterprise market for this is summarised as ‘perishable insights’ and is well evaluated by Mike Gualtieri at Forrester – see his “The Forrester Wave™: Big Data Streaming Analytics Platforms, Q3 2014“. Apart from the Enterprise software vendors such as IBM, I’ll link very briefly to DataTorrent as an example of a hybrid batch / tuple model, with Google’s MillWheel also apparently something similar(?).

In-Memory Computing

Supporting this scale at speed also means In-Memory Computing. I don’t personally know a lot about this, so this is the briefest of brief mentions. See for example the list of contributors at the In-Memory Computing Summing in SF in June this year here. Reading through the ‘case studies’ of the vendors is enough to show the ‘real world’ applications that work in this way. It also touches on some of the wider debates such as ‘scale-up’ v ‘scale-out’, and what larger hardware or infrastructure companies such as Intel and Pivotal are doing.

Machine Learning at Speed: Berkeley BDAS and Spark!

So we’re back to where we started. One of the main issue with ‘Machine Learning’ at either scale or speed in many guises is scalability of algorithms and non-linearity of performance, particularly over clustered or distributed systems. I’ve worked alongside statisticians working in R on a laptop and we’ve had to follow rules to sample, limit, condense and compress in order not to overload or time out.

In the Enterprise world one answer to this has been to ‘reverse engineer’ and productise accordingly, with the investment required to keep this proprietary and closely aligned with complentary products in your porfolio. I’m thinking mainly of Tibco and their Spotfire / TERR products, which I understand to be ‘Enterprise-speed’ R.

Another approach is to compare the evolution within the Apache ecosystem of competing solutions. Mahout initially was known to be ‘slow’ to scale, see for instance an earlier post in 2012 by Ted Dunning on the potential for scaling a knn clustering algorithm inside the MapR Mahout implementation. Scrolling forward a few years to now, this is now looks to be similar to competitive territory between separate branded vendors ‘pushing’ their version of speed at scale. I couldn’t help noticing this as a Spark MLlib v Mahout bout in a talk from Xiangru Meng of Databricks (Spark as a Service) showing not only the improvements in their MLlib 1.3 over 1.2 (yellow line v red line) but ‘poor old Mahout’ top left in blue making a bad job of scaling at all for a ‘benchmark’ of an ALS algorithm on Amazon Reviews:


So one valid answer to ‘So how do I actually do Machine Intelligence at Speed’ seems to be ‘Spark!’, and Databricks has cornered the SaaS market for this.

The Databricks performance metrics quoted are impressive, even to a novice such as myself. The ecosystem in evolution, from technologies, APIs to partners and solutions providers, looks great from a distance. There are APIs, and pipeline and workflow tools and a whole set more.

Databricks is a child of AMPLabs in Berkeley. The Berkeley Data Analytics Stack BDAS provides us with another (3rd) version of ‘architecture’ for both Big Data and Machine Learning at Speed.


BDAS already has a set of ‘In-house Apps’ or projects working, which is a good sign or at least a direction towards ‘application’. One example is the Cancer Genomics Application ADAM,  providing an API and CLI for manipulation of genomic data, running underneath on Parquet and Spark.

Velox, one of the most recent initiatives, is for model management and serving within the stack. It proposes to help deliver ‘real-time’ or low-latency model interaction with the data stream that it is ingesting, a form ‘self-learning’ in the form of iterative model lifecycle management and adaptive feedback. Until recently, large-scale ‘Web giants’ had developed their own approaches to manage this area.

AMPLabs Velox Example 1

This is particularly exciting, as it provides a framework for testing, validation and ongoing lifecycle adjustments that should allow Machine Intelligence model implementation and deployment to adapt to changing behaviours ‘online’ and not become obsolete over time, or at least not as quickly, before they require another round of ‘offline’ training and redeployment.

AMPLabs Velox Example

The examples given (for instance above for a music recommender system) are relatively constrained but show the power of this to not only make model lifecyle management more efficient, but also help drive the creation of applications that will rely not only on multiple or chained models, and thus a higher degree of complexity in terms of model lifecycle management, but also where models involve radically different data types or behavioural focus, which I’m going to look at later. And all at speed!


Stadium Migrations – a ‘Once in a Lifetime’ Journey

Professional sports is, as I’ve written before and will probably write again, a funny old business. The customer is not a customer but a supporter. The Club is a place where the sporting spectacle unfolds, live and un-intermediated, never to be repeated but available, subject to sporting competition format and culture, once a week or so on average if you missed the last one. The bond or affiliation between the supporter and the Club runs deep and is, generally, exclusive and for life, forming an important component of social and individual identity. Supporters are in general a loyal and tribal bunch. They wear the shirt, they sing the song, they share in the journey, and many choose to get married, divorced and have their ashes scattered in the ‘temple’ and centre of activity – the Stadium.

I’m not, as you may have already perceived, a particularly strongly affiliated sports fan or supporter in any private way. Which is lucky, as the company I work in has over 50 individual or separate clients and I feel morally and socially unequipped to deal with the levels, tiers and currents of emotional attachment, reattachment and guilt that such manifest polygamy might entail.

I’ve been privileged, however, to participate in my own way in a number of ‘Stadium Migrations’ in my time at Sports Alliance. ‘Stadium Migration’ refers to the process undergone when a Club chooses, whether by accident or design, to move home from one location to another. This need not be a significant geographical translation – many Stadiums are re-developed ‘in situ’ or adjacent to the original. But whatever the distance, the process is, for most supporters and the Club staff involved, a ‘Once in a Lifetime’ experience.

Clubs choose to ‘migrate’ for a variety of reasons. The old place is looking tacky, falling down, not big enough or not the right balance in terms of levels of product and service they can offer, or it may have been only a ‘temporary’ home due to external circumstances and a short-term sharing arrangement has come to an end. This side of the Atlantic the ‘Club’ at least will be an original organisation with probably over 100 years of history, culture and attachment. They don’t ‘invent’ Clubs, at least as a rule, in Europe as they do in a franchise system. So for the supporters involved it is a ‘migration’ from one place to another, the focus is on the existing supporters seamlessly and not on attracting any new ones. Here at Sports Alliance we do data and marketing, so we’re involved in the ‘Customer Experience’ and very definitely nothing to do with concrete, steel, or grass.

My ‘first’ migration was in many senses the biggest, for Arsenal in the move from Highbury 1 or 2 miles down the road to what is now The Emirates in North London. Here is an example from the 2005 Brochure for allocation of seats to existing supporters.


Since then I’ve worked on or am currently involved in projects for RCD Espanyol, Athletic Bilbao and Barcelona in Spain, Swansea City in Wales, and Saracens Rugby, Tottenham and West Ham in London. Each of these is, of course, different to each other and unique in some way or another. Here I want to talk about the shared or in common elements.


What’s involved? First, we look at Ranking, Relationships and Relocation…

Virtually all migrations we’ve worked on involve creating a ‘Queue’ based on a Ranking system. The Queue needs ordering or ranking according to some agreed criteria – how long you’ve been a member for, how much money you paid for your existing or old seat, whether or not you have an interest or shareholding where applicable, or a more genteel approach so that elders, or women and children, go first.

Relocation is related to the physical location of a seat occupied as part of a season ticket or membership product. This often involves animated discussion and analysis of ‘equivalency’ between two very often very different stadium layouts. These can differ massively according to the number of seats, and more importantly the specific layouts according to stadium design for tiers, blocks and rows, and where exits and other stadium features interrupt or not in both ‘old’ and ‘new’ locations.

Here is an example of a stadium layout that shows ‘missing’ seats from an old to a new stadium – seats that ‘disappear’ when a notion of equivalence has been applied. Blue shows seats that do not have directly equivalent seats, here mainly due to the different layout of the aisles and ‘vomitorios’ or exits.


The criteria decided for individual entitlement or ranking must also work for ‘groups’ of supporters who choose to apply or sit together, either so that new people are encourage to apply and join in or so that existing supporters do not feel that their entitlement is being diluted. Groups may be inferred – based on inference on existing sales or behavioural data for seating or family. Here is an example of a stadium layout representation looking at Age Band. The colour scheme or pallette has been chosen to represent ‘age bands’, where blue is ‘youth’, green is ‘adult’,  red ‘senior, purple the transition between youth and adult. For each age band, a darker shade or hue is intended to show increase in age.


The next example shows an additional dimension of ‘grouping’ based on inferred relationships between adjacent seated customers. The groups are based on grades of proximity or closeness of relationship – from degrees or probability of family relationships to neighbourhood location.


Lurking behind any ‘public’ criteria of ranking will be a Club’s opportunity or desire to ‘up sell’ customers to premium products, if they so desire.

How long does it take? From a few months to a few years… The longest from the list above completed was for Arsenal, taking place over a two year period in the run up to the Emirates opening. The shortest has probably been Athletic Bilbao, which took a turn for the worse or better when the Club shortened the timeline to the coming September when they realised that the physical build was advancing sufficiently. Here is a view of the shiny new San Mames stadium from across the river.

San Mamés 031 p

Once the ‘rules’ for customer or supporter engagement have been agreed, its then time to turn this in to a sales and marketing plan. Because this involves a seat selection in what is often a non-existent venue, this almost always involves a face-to-face visit in a specially constructed or administered sales centre, allowing for the supporter or group of supporters to confirm their choice of seat and for reservations and deposits to be taken at the same time. It’s also important as an ‘experience’ that for many is cherished and downright emotional. At Bilbao, for example, the ‘eldest’ socio (member or season ticket holder) was ceremoniously invited to begin the process. Cue flag waving and tears, and for good reason!

Our role here is also to prepare and furnish a sales and appointment management system that interfaces with existing Club systems for ticket or seat booking and reservation and any marketing or internet applications for communication or grouping assignments. We usually do this in a customised version of an ‘off the shelf’ CRM system such as Microsoft Dynamics, using where possible existing functionality for contacts, groups, service resources and service appointments, and the marketing list and outbound communications that result from this.

We’re currently in the process of planning for 3 new proposed migrations in the UK and Europe, so more will follow.

Marketing at Speed in Professional Sport

We’ve recently, as in the last six months recently, completed our first ever pair of ‘Enterprise Software’ partnerships. Like London buses, you wait ten years for one then suddenly two come along at once .. etc. etc.

One partnership is with Adobe Campaign, a Marketing Automation solution, and another with Tibco for their Business Works  Enterprise Application Service and Messaging product. Both are envisaged over the same sort of term – three years or more, and both are at the heart of a new effort to provide ‘Enterprise Class’ sales and marketing services to our clients. Together these represent a combination of solutions to handle both data integration and end-to-end processes for ‘Marketing at Speed’ in Professional Sport.

Tibco helps connect the dots and provide ‘real time’ data.

Adobe Campaign provides an automation layer for marketing communications that are personalised, timely (thanks, Tibco) and relevant.

Use cases focus on two areas – ‘Bricks’ – for in stadium / on location, and then ‘Clicks’ for ‘the internet’.

For ‘bricks’, we need to know when a supporter is ‘present’, and be able to identify who they are at the same time. In the ‘real world’ this relies on Access control for perimeter access notification and then ‘loyalty’ schemes for EPOS / Till systems linked to a membership card.

For ‘clicks’ we’d like to know when a customer signs in or downloads an app and links this to their membership. In principle an app can also access GPS for physical location, or go further in-store for NFC or iBeacon level of location information.

From them on Tibco helps applications exchange information in real-time, and Adobe furnishes these applications with the right marketing content that is relevant to the individual supporter and the ‘context’ – including the club’s offer or promotion catalogue.

Behind both is the Sports Alliance ‘data model’ that powers both the identification of the supporter and their history with the Club.

Its been ten years in the waiting so its worth explaining the background to this. We’re a funny kind of company working in a funny kind of market.

First, the market. Its fragmented, disjointed and, some of the times, possibly schizophrenic… and underlying this limited in terms of resource and expertise required to tame software and data for ‘enterprise class’ marketing.

To the average observer or ‘supporter’ – the generic term for customer in the industry – a Professional Sports Club looks shiny and big from the outside, with extensive media exposure, a smattering of celebrity and a generous and tangible physical asset with a rectangle of grass sitting in the middle. Commercially, however, they tend to be both extremely fragmented and yet also extremely limited in capabilities in many areas.

The direction and flow of money through the business is public knowledge, the majority inflow coming from higher-level media or sponsorship deals intermediated by leagues or other bodies exclusively designed to market and sell these rights, and flowing down and out directly almost straight through to the players, and their agents, who provide the product.  The infrastructure and organisation remaining inevitably has to focus on what it takes to underpin the ‘main event’ – the matchday – and the other ongoing customer-focussed businesses such as retail, hospitality, or community struggle for attention and resources from what is left over.

The marketing side of the business is primarily all about retention, structured around subscription or membership products (season tickets), and focussed on monetising the big concrete and steel asset with grass in the middle 2 or 3 times a month. In these terms, even the ‘biggest’ brands are local or regional at best in terms of the customer base who will come to the venue and experience the product and pay for it on a regular basis.

Secondly, then, back to us. Sports Alliance has been defined by the clear tension between these realities or limitations on one side and the expectations and ambitions on the other, spread and shared across many individual clients. As a guide to our client network, we have over 50 individual clients. In terms of size, the best proxy is unique customers or supporters under management. Our largest clients will have more than a million, the smallest down to a few tens of thousands. Pooled together we have nearly 15mm customers under management from the UK and mainland Europe. And yet both these ends of the scale, largest and smallest, are in the same business, with the same supporter needs or expectations, and the same ‘customer experience’ to try and deliver.

Sports Alliance as a company has been formed and evolved around these twin realities – part technology, part services, providing an extremely broad set of functions and applications from ‘back end’ systems integration and marketing data warehousing, to ‘front end’ multiple line of business sales and marketing applications or solutions, and then a further layer of services on top for clients that also want them. Within all of this is a focus on a data model that makes sense and can be replicated and evolved within and between clients.

For the ‘back end’ technologies, until now we’ve been firmly in the ‘build your own’ camp, based primarily on Microsoft technologies – SQL Server in the middle, with some C# /.NET and web tech from ASP to MVC and WCF SOAP/XML. ‘Front-end’ is more hybrid, split between ‘lower level’ partnering and some more of the ‘build your own’. Where the market has evolved in terms of SaaS providers we’ve partnered or integrated with the obvious leaders, for CRM this means Dynamics or Salesforce, and for ESP a broader set of suppliers, and for BI Tableau. In any of these, we find ourselves primarily concentrating on data schema and configuration required for replicated a data model and then letting the standard application functionality work around this. In some areas, for example Marketing Campaign Management and Loyalty, we’ve carried on and built ‘Front-end’ completed solutions, mainly down to the fact that 3rd party products in these areas were either too expensive or not fully fit for purpose.

We’ve watched as in recent years occasionally one or two of the larger clubs in our universe have taken a direct sales route to ‘enterprise’ software and solutions, often as part of a sponsorship package, and we believe it would be fair to say that the success rate or return on investment where this has happened would be marginal or arguable at best. The skills and resources required not only to implement but then maintain and evolve these solutions are often underestimated.

Anyway, enough about us or ‘me’. It takes two to tango, or to do a deal, and ‘Enterprise Software’ vendors had, in our niche, remained resolutely ‘Enterprise’ in approach to this. As noted above, where there was the possibility of a ‘big catch’ with a traditional, direct capital expenditure model, the software sales team would carry on in this vein if there was any possibility of a good old fashioned commission on sale. Build it, market it, get one or two ‘reference’ big fishes, and surely they will come.

Sports Alliance, even in pooling together networked resources, has financially been unable or unwilling to go for anything resembling a traditional ‘Enterprise’ solution licence independently. Pooled together as 15mm customers we look even more suspiciously like an ‘Enterprise’ and any traditional licence would in itself be many times the multiple of our total company turnover.

So, what’s changed? Essentially, it looks like we’ve met in the middle.

First, the Enterprise software approach has, at least in our case, shifted towards a more accessible quarterly subscription model, billed in arrears, and one layered to allow for a ‘slow start’ with clear tiers going up as business or usage increases or grows. And that was probably the most significant and necessary component. I spent a number of months with another Enterprise Application vendor who offer a subscription model, but billed only it turned out yearly in advance. Its hard to fund that in our business. Why has his changed? Overall, I believe this is a realisation that the ‘Enterprise’ market model simply doesn’t work outside in the world of SME/SMBs, and some money for software that has very little other cost of sale is simply better than none.

Secondly, Sports Alliance has to recognise that we can’t continue to spread ourselves so thinly and improve on what we do for our clients significantly. We’d continue, as one client said to me recently, to be ‘stuck in third gear’ (a metaphor that will rapidly become meaningless with self-driving cars. I presume he meant a 5 speed car gearbox as well, and not a 10 speed rear cassette on a bicycle).

The use-cases we’re looking at focus on ensuring that the ‘full’ customer profile and history this represents is visible in two massively important areas – when the customer is on site, and at the other end on the internet. Remember, we’re fortunate to work in an industry where the nature of the customer affiliation is such that the more we can show we know about them, the better. Other brands might look spooky or raise concerns of privacy when showing knowledge over a customer relationship that can span decades of sales history, and involve close personal and family relationships across generations.

For on-site, this revolves around the ‘real-time’ physical presence of the customer at the stadium or venue, and ensuring that the customer experience is as delightful as it could possibly be. ‘Real-time’ is difficult to achieve in an industry that is heavily siloed or disparate in systems and applications, and where integration tends to be batch based. We need to identify the location of the customer onsite either by access or entry to the perimeter or at a point of sale via a loyalty scheme or programme identifier.

For the internet, there is also the additional hurdle of identification in an application in a market where many Clubs do not control their own rights or properties and where Ecommerce partners will have separate, standalone applications for sales. Single Sign On, and the opportunity to treat a customer consistently across different applications or touch points, is still very much a work in progress.

So, once identified and we know where they are – in stadium/shop or in an application – its over the marketing department and the campaign or offer catalogue

  1. Personalised in-store loyalty promotion
  2. In application games ‘points boost’ for behaviour
  3. Prompt to redeem loyalty points in newly opened hospitality area exclusively for Season Ticket Holders
  4. Special event local to supporter’s on following weekend for Soccer Schools participants from previous year who have had a birthday party at Club
  5. And so on…

Programmatic – Online, TV and Beyond?

I once worked as a media planner in an small digital agency in Central London (queue fairy tale intro…). I’ve been speaking to ex-colleagues who still work in the advertising and media world recently to catch up on all of the interesting things that have changed. Chief among these is programmatic or ‘real time’ advertising, and the flip side of the remaining intransigence or resistance of the TV market to allow itself to be infiltrated by this invidious new practice.

I know first hand the massive inefficiencies and vested interests at play in the ‘traditional’ media planning and buying process, where lunches, personal relationships, buying commission and the surreal practicalities of panel-based viewing as attribution mechanism, would only occasionally be interrupted by excel or the econometrics department when wheeled in with a chart. (This is intentionally facetious and exaggerated, and yet….). At the same time I’ve heard first hand at how suitably robust and acceptable indirect attribution can and is being produced between budget and spots (advertising on TV) and response (primarily digital or online in terms of ecommerce or customer acquisition).

What isn’t there yet is any ability to link the individual – the ‘user’ or ‘customer’ – across channels or media in a suitably seamless manner. This much anyway is familiar to me in my Sports Alliance role.

I’m writing this explicitly not as any market expert, but in the example this represents for the application of machine or software-driven decision making in a large and yet still immature market area or space. Particularly one dominated by the online web giants – Facebook, Yahoo, Google – and the resources they have invested in for their own ‘Artificial Intelligence’ and how this is or will be integrated to their product or application stack.

The RTB exchange system, and its continuing evolution to become a fully-fledged trading floor with supporting mechanics for futures/options, is here to stay, thank god. I’ve seen some of the numbers from friends for the number of transactions or queries per second the market as it currently stands can process or accommodate and its impressive, or at least it is to me anyway.

What I’m less sure about is how ‘intelligent’ the system is in its ability to predict the desired event (click) or allocate or optimise resource (advertising, of different formats, and budget related). I’ll come back to this later.

Graph Relationships – Social and Otherwise – in Professional Sport

Good ‘old-fashioned’ Customer Relationships are important in the sector that I work in – Professional Sport. Usually this focusses on the relationship between the brand / venue or club and the customer or supporter (note the flexible terminology here). Retention is core in an industry where the affiliation is often ‘for life’, and the concept of churn is interpreted as gradations or levels or involvement and not as ‘defection’ to a rival brand, so loyalty and the recognition of this by the club in relation to their customers is massively important in reinforcing the ‘tribal’ relationship the supporter feels to the club in question.

Lately, we’ve been doing a lot more analysis of relationships between supporters themselves, more along the lines of good ‘new fashioned’ social or graph relations. This has been driven partly by some of the projects we have been working on and their requirements – particularly large scale ‘once in a lifetime’ stadium migrations – and partly by the availability and access to the technologies that enable this. Examples here include the underlying technology such as SPARQL / RDF, as well as the ability to communicate or visualise this in tools such as D3 or Tableau.

What it doesn’t mean yet is ‘social media’ in terms of facebook, twitter or other mainstream consumer products. We’ve been hampered here by the continuing limitations on many club’s ability to link or identify a social media ID with a real-world ID – digital or otherwise. I’ll comment on this elsewhere in more detail.

In terms of Graph Analytics, we’ve been particularly interested in the patterns that represent groupings or shared activities that help re-reinforce the ‘real world’ relationships that are otherwise hidden to a club. Examples in the real world (bricks not clicks) would include who sits next to whom in a stadium seating plan, who arrives with whom and at what time in terms of stadium or venue access, and what we can infer regarding the nature of their real-world relationships that these patterns represent.

I’ll be updating this with examples later.

Fingerprints in the Archives

I spent a lot of time when researching a PhD on Elizabethan politics and culture, focusing on an individual diplomat and politician called Robert Beale, looking at handwritten letters in various libraries or archives, mostly in the UK, but also in Europe and North America. I completed this PhD in 2000, but never got it published, as I had already started work in an internet advertising agency, and I then got married, started to have children and so tucked it all away in an archive of my own.

I’ve gone back to this material only recently, partly spurred on by a private desire to connect my academic past to my professional present and future, and also connected to in interest in applying Machine Learning or Artificial Intelligence tools to different domains or areas.

The emergence of ‘Deep Learning’ (more here later) in recent years as a technique to aid classification in a broad variety of areas or applications was something I, along with many others now, was familiar with. I was aware enough of the techniques involved to surmise that an application that recognised the identity of a handwriting sample, as well as ‘reading it’, should be something that would be possible.

The presentation here Mark Taviner Handwriting Identification and Applied AI Talk June 2015 is what I have written this month on this. The working title ‘Fingerprints in the Archives’ should be fairly obvious, and I reference the USA FBI NGI fingerprint system, as well as Facebook’s ‘DeepFace’ research in the talk.

Thanks to Professor Cathy Shrank for her support when I suggested this to her, and her video message on why this would be a valuable research tool for historical, cultural and literary research is hosted on Youtube here. I appreciate very much subsequent confirmation from Jeremy Howard at that my ‘guess’ was correct and reference to a dissertation by Luiz Gustavo Hafemann that showed ‘state of the art’ results against a Brazilian Author Handwriting reference database.

There remains much work to be done…


A time coming, but this is a record and notes on things I’m thinking about. Its new because:

1. An uneasy relationship with the invidious first person pronoun singular ‘I’

2. I’ve been working in a company and industry that is relatively ‘internalised’ in our approach to marketing and confidentiality or transparency

For a bit of background, see the ‘About Me‘ page. The rest will be written as a set of blog entries.

I don’t do social media as a private individual. LinkedIn clearly has it utilities in terms of professional contacts and search, but the idea of publicising my private life in the form of text or images on Facebook, Twitter, Tumblr fills me with great dread.