Going Through the Soccer Analytics Handbook – Part 1

Link to Devin Pleuler’s Soccer Analytics Handbook

While we are all stuck at home, it’s all we can do to stay sane. At this point, if you’re like me, you have run through just about every form of entertainment and are scratching the bottom of the barrel. Recently I discovered Devin Pleuler, the Director of Analytics for Toronto FC, created a handbook for soccer analytics. In order to keep myself honest, I’m going to create write-ups to accompany my progress.

First thing’s first is that we will be using Python to go through this book. Fortunately, I have a few years’ experience with the language and have been using it both personally and professionally. As he mentions in his handbook, Python is a very well-documented language and there are seemingly endless websites to learn from. I am a self-taught Python coder and am constantly learning something new every day. Devin is also kind enough to point us to free soccer data, namely from StatsBomb and Metrica, that will accompany our learning going through this book.

The first lesson of the book starts with the basics, naturally. Using Python’s packages requests and Pandas, we learn how to pull data down from these websites via their APIs. I am pretty familiar with these packages and would consider myself intermediate to advanced in their use. However, Devin is clearly well-versed at tidy coding. He accomplishes things in one line where I might try to accomplish them in two or three or using a sloppy concatenation.

How I would attempt the code before:

base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"

def parse_data(competition_id, season_id):
    matches = requests.get(url = base_url + "matches/" + str(competition_id) + "/" + str(season_id) + ".json"

How he completed the code:

base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"
comp_url = base_url + "matches/{}/{}.json"

def parse_data(competition_id, season_id):
    matches = requests.get(url=comp_url.format(competition_id,season_id)).json()

While he uses an extra line, the way he completed the comp_url variable leaves room for versatility and auditing the code. Like I said, I’m learning every day.

He goes on to create a function that pulls down all the shot data in every match of the World Cup. He accomplishes this by sorting through every event that happened in each match and only keeping the shot events. I’m going to take a little detour here because my natural curiosity wants to know just how many event types there are. I accomplish this by making a few modifications to his existing script:

import pandas as pd, requests

base_url = "https://raw.githubusercontent.com/statsbomb/open-data/master/data/"
comp_url = base_url + "matches/{}/{}.json"
match_url = base_url + "events/{}.json"

def get_all_event_types(competition_id, season_id):
    matches = requests.get(comp_url.format(competition_id, season_id)).json()
    match_ids = [m['match_id'] for m in matches]
    all_events = []

    for match_id in match_ids: 
        events = requests.get(match_url.format(match_id)).json() 

    all_events_list = [item for sublist in all_events for item in sublist] 
    event_types = set([event['type']['name'] for event in all_events_list]) 


And what we get in return:

'Bad Behaviour',
'Ball Receipt*',
'Ball Recovery',
'Camera On',
'Camera off',
'Dribbled Past',
'Foul Committed',
'Foul Won',
'Goal Keeper',
'Half End',
'Half Start',
'Injury Stoppage',
'Own Goal Against',
'Own Goal For',
'Player Off',
'Player On',
'Referee Ball-Drop',
'Starting XI',
'Tactical Shift'}

Quite a lot of events that StatsBomb offers us. I’m looking forward to playing around with this data some more!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s