Machine Learning, Professional Sports and Customer Marketing in the UK and Europe

Professional Sports Customer Marketing is driven primarily by two key product lines or revenue areas – subscriptions or memberships, and seat or ticket products. The two are ‘combined’ for the classic or ever-green ‘Season Ticket’ packaged product that is essentially the first tier in a membership programme, and on which other ‘loyalty’ programmes or schemes can function.

This post looks at the application of ‘Machine Learning’ in the form of both supervised and unsupervised methods to Customer Marketing in Professional Sport.

I’ll start with an example of an unsupervised approach, using a ‘standard’ k-means algorithm to identify clusters of Professional Sports Club Customers based on features or attributes that describe as broadly as possible Customer profile or behaviour over time. These features were built or sourced from an underlying ‘data model’ that looks at the following areas broadly

  1. Sales transactions – baskets or product items purchased by a customer over time, broken down by product area in to Season Tickets, Match Tickets, Retail (Merchandise), Memberships and Content subscriptions
  2. Socio-Demographic – relating to individual identity, gender, geography and also to other relationships to other supporters in the data o
  3. Marketing Behaviour – engagement and response to outbound and inbound marketing content over time

We wanted to create a ‘UK Behavioural Model’ that would be representative for UK Sports Clubs, so we created a sample in proportion to overall Club or client size from a set of 25 Clubs in the UK from Football, Rugby and Cricket. The sample consisted of 300k from an overall base of approximately 10 million supporters. The input or feature selection was normalised for all Clubs. We experimented with different iterations based on cluster numbers and sizing.

The exhibit below shows a version with 15 different clusters, numbered and coloured across the first row from 0-14. The row headers are the different features or feature groups. The cluster colours are persisted throughout the rows for each feature. The horizontal ‘size’ of the bar for each cluster in each row is the average per customer for each feature. The width or horizontal size of the bar for each cluster in each row relative to the first row for size is intended to provide a visual guide to differences between clusters.


Revenue £££s in the bottom 3 rows is generated from a handful of clusters only:

  • Purple 8 and Mauve 9 dominate Ticketing revenue
  • Red Cluster 6 and Mauve 9 dominate Memberships revenue
  • Grey Cluster 14 contributes to Merchandise revenue

Pretty much of all of the ‘NonUK’ supporters have been allocated to Light Blue Cluster 1.

Interestingly, Gender (M/F) or Age (Kids, Adults, Seniors) don’t seem to discriminate much between Clusters. See the exhibit below that plots ‘Maleness’ on the Y axis and ‘Age’ on the X axis.


Cluster 4 is ‘Old Men’, Cluster 14 is ‘Ladies of a Certain Age’ but the majority of Clusters (circle diameter proportional to size or number of Customers) aren’t really discriminated by these dimensions or features. We concluded that behaviour of kids ‘followed’ or emulated adults in terms of key features for attendance and membership.

The next section looks at a supervised approach, using a decision-tree classification algorithm to identify ‘propensity’ for members or subscribers to renew or churn, and then the converse for non-members or subscribers to ‘convert’ to become a member, based on similarity to previous retention or acquisition events.

Our work in this area began tentatively in 2010 using an outside consultant (Hello Knut!) from a large software vendor working on a single project membership churn issue in a large Northern London football Club. In the course of the past 5 years, we’ve taken the approach ‘in house’ and ‘democratised’ this in a certain way and applied the techniques over-and-over to different clubs (Hello Emanuela!). We’ve tried to make this as efficient as possible by engineering a common feature set across all clubs and seasons based on a common data model.

For the retention model, we’ve continued to build and train a model for each club AND for each season, as we saw a greater predictive or accuracy over time, also based on including features that encapsulated ‘history’ for each customer up until that season as fully as possible.

For the acquisition model, we have modified the approach slightly using a single input of all acquisition events regardless of season, but still only one club at a time. This was based on the belief or the observation that people became members for roughly the same reasons regardless of season, whilst people ‘churned’ from becoming a member to a non-member based more on season -to -season performance and issues.

Decision trees are often cited as being on the more ‘open’ or ‘transparent’ end of the scale of classification techniques or approaches. However, we’ve succeeded in operationalising the retention model in to Club Sales and Marketing systems and programmes only by using the features or variables that ‘float to the top’ of the decision tree to construct a ‘risk factor’ matrix based on observed ‘real’ behaviour and change in these over time for the customer.

Here’s an example of feature or variable correlation for the STH Acquisition model:


What was particularly interesting here was the importance of the ‘Half Season Ticket’ holding in the previous season and then the ‘groupings’ represented by the other Half Season Ticket holders who lived with the same supporter. This points very clearly to the inter-relationships that we ‘know’ are important between individual supporters, and leads us towards a more Graph-based analytical approach to identify and analyse relationships at play at specific points in the customer life-cycle, life-stage and buying relationship with the Club.

Our industry or sector is still dominated by the ‘Season Ticket’, a ‘hero product’ that continues, like fine wine or an ageing Hollywood A-lister, to defy the years and live on to snaffle a majority of our clients time and attention and share of revenue. The more that we can do to understand the ‘patterns’ behind this, the better.