Jump to content
IGNORED

Expected goals - question


Recommended Posts

This might be one for @Davefevs 

What percentage of results of, say, Championship football matches last season were accurately reflected by the ‘expected goals’ metric?
I know this metric can be/is used to show how well a team has performed in a game but I’m wondering how accurately - or otherwise - it reflects outcome as well.

It would be interesting to see the stats for both overall result and margin of victory.

Link to comment
Share on other sites

I don’t know.  Individual results will be skewed by it being a 90 minute sample, I.e. very small.

Experimental 361 are a good site to look at….but look away quickly!

https://experimental361.com/2021/05/09/expected-goals-table-championship-2020-21/

Difficulty is trying to evaluate a draw, e.g. how close does each team’s xG need to be to its opponents to class as a draw.  E361 massively over predicts draws, values have to be within 1/3rd of each other.  I suspect if they narrowed that difference, more results would be turned on their head, so a balance has to be found.

 

 

  • Like 1
Link to comment
Share on other sites

39 minutes ago, firstdivision said:

This might be one for @Davefevs 

What percentage of results of, say, Championship football matches last season were accurately reflected by the ‘expected goals’ metric?
I know this metric can be/is used to show how well a team has performed in a game but I’m wondering how accurately - or otherwise - it reflects outcome as well.

It would be interesting to see the stats for both overall result and margin of victory.

The book I have on the subject "Football Hackers" by C. Biermann broadly summarises that "...the team creating the better quality of chances only ends up winning the game two thirds of the time...". However it doesn't give a qualitative overall "accuracy" figure such as you seem to be looking for.

14 minutes ago, Davefevs said:

I don’t know.  Individual results will be skewed by it being a 90 minute sample, I.e. very small.

Experimental 361 are a good site to look at….but look away quickly!

https://experimental361.com/2021/05/09/expected-goals-table-championship-2020-21/

Difficulty is trying to evaluate a draw, e.g. how close does each team’s xG need to be to its opponents to class as a draw.  E361 massively over predicts draws, values have to be within 1/3rd of each other.  I suspect if they narrowed that difference, more results would be turned on their head, so a balance has to be found.

A couple of seasons ago E361 comfortably beat the bookies in terms of the overall accuracy of his pre seasons prediction. He's seemingly not published that in recent seasons - possibly because I believe he's since been employed by said bookies.

  • Like 1
Link to comment
Share on other sites

So how much should take notice of Xgs?  It seems to have become a go-to mantra for post-match analysis. But there seems to be enough variation to make it a bit clunky. (Although my eyes told me we were close to being the worst side in the Championship last season, if not the worst.)

The notion of an ‘expected goal’ is a curious one. There are so many variables with each chance compared to previous chances from a specific position eg how tired the player is, the relative strength of his left and right feet, his confidence levels, the exact positioning of the opponents, the weather, the state of the game (a player is likely to be more anxious at 0-1 than 3-0), the exact weight of pass or cross, the quality of the pitch. I could go on. And on. My point being that no two chances are really alike.

Unless I’ve completely misunderstood what an expected goal is.

My understanding is that it’s essentially based on all available data for a chance from any given position.

Please feel free to correct.

Link to comment
Share on other sites

10 minutes ago, firstdivision said:

So how much should take notice of Xgs?  It seems to have become a go-to mantra for post-match analysis. But there seems to be enough variation to make it a bit clunky. (Although my eyes told me we were close to being the worst side in the Championship last season, if not the worst.)

The notion of an ‘expected goal’ is a curious one. There are so many variables with each chance compared to previous chances from a specific position eg how tired the player is, the relative strength of his left and right feet, his confidence levels, the exact positioning of the opponents, the weather, the state of the game (a player is likely to be more anxious at 0-1 than 3-0), the exact weight of pass or cross, the quality of the pitch. I could go on. And on. My point being that no two chances are really alike.

Unless I’ve completely misunderstood what an expected goal is.

My understanding is that it’s essentially based on all available data for a chance from any given position.

Please feel free to correct.

You’ve got it sussed.

I use it for two things:

1. How good a chance it was on a purely individual basis.  People say it was a half-chance, when the xG is something like 0.1!!!

2. I use X361 or Wyscout xG timelines to show flow of chances, and when these take places, e.g. in clusters, or regular, or in City’s case, rarely!!!

I never use them to suggest a team should win or lose.  I might suggest a team was a bit lucky or unlucky to get a result based on chances created.

It’s probably a bit better than looking purely at shots on target.

 

  • Like 2
Link to comment
Share on other sites

I've always found the expected goals nonsense to be, well, nonsense. 

Who gives a **** what some algarythm thinks should be the case..?! 

The beauty of football is that it's not easy to predict. 

It a typical modern day need that it pampers too. 

I don't understand it and have absolutely no interest in finding out more about it. 

 

  • Like 4
Link to comment
Share on other sites

Over the past few seasons, comparing the XG between Brentford and us was interesting.

It showed we were batting above our average so to speak, and that Brentford were under achieving.

Compare the stats to what you saw with your own eyes, it was obvious Brentford were in an upward spiral and that we were in a massive downturn, even though tables didn't show it.

It eventually caught up last season. The trends continued.

XG isn't perfect, but it does give you an insight to which way a club is progressing.

  • Like 1
Link to comment
Share on other sites

1 minute ago, spudski said:

Over the past few seasons, comparing the XG between Brentford and us was interesting.

It showed we were batting above our average so to speak, and that Brentford were under achieving.

Compare the stats to what you saw with your own eyes, it was obvious Brentford were in an upward spiral and that we were in a massive downturn, even though tables didn't show it.

It eventually caught up last season. The trends continued.

XG isn't perfect, but it does give you an insight to which way a club is progressing.

Yes, as is often the case with statistics, it's the trend that is informative.

  • Like 2
Link to comment
Share on other sites

12 minutes ago, Banned User said:

xG can only be as accurate as the data guys that create the algorithm, who are they to say what should be a goal or not

They don't decide. It is based on a massive amount of data covering what actually happens over time.

  • Like 1
Link to comment
Share on other sites

1 hour ago, firstdivision said:

So how much should take notice of Xgs?  It seems to have become a go-to mantra for post-match analysis. But there seems to be enough variation to make it a bit clunky. (Although my eyes told me we were close to being the worst side in the Championship last season, if not the worst.)

The notion of an ‘expected goal’ is a curious one. There are so many variables with each chance compared to previous chances from a specific position eg how tired the player is, the relative strength of his left and right feet, his confidence levels, the exact positioning of the opponents, the weather, the state of the game (a player is likely to be more anxious at 0-1 than 3-0), the exact weight of pass or cross, the quality of the pitch. I could go on. And on. My point being that no two chances are really alike.

Unless I’ve completely misunderstood what an expected goal is.

My understanding is that it’s essentially based on all available data for a chance from any given position.

Please feel free to correct.

 

1 hour ago, Davefevs said:

You’ve got it sussed.

I use it for two things:

1. How good a chance it was on a purely individual basis.  People say it was a half-chance, when the xG is something like 0.1!!!

2. I use X361 or Wyscout xG timelines to show flow of chances, and when these take places, e.g. in clusters, or regular, or in City’s case, rarely!!!

I never use them to suggest a team should win or lose.  I might suggest a team was a bit lucky or unlucky to get a result based on chances created.

It’s probably a bit better than looking purely at shots on target.

 

I've read a few bits on XG etc, I think I'm right in saying the current system doesn't take onto account defenders positioning. I have a vague recollection of someone saying they were looking into adding something, I imagine that would be a massive task, seeing that each chance it's it's own entity. 

I think they are like any stats, used wisely they can give you a guide. Possession was thrown up a lot previously, but some teams don't play a possession based game and are happy to let the other team keep the ball for spells. I remember Brum won one game recently, 30% possession and won 5-1 . Same with XG, can be fun to browse to give a different perspective, but when all said and done if you've won 2-1 , you've won 2-1 .

Link to comment
Share on other sites

12 minutes ago, chinapig said:

They don't decide. It is based on a massive amount of data covering what actually happens over time.

Why though..? 

I honestly don't get it..! 

It's not even a factual analysis, like possession. It's utterly pointless & irrelevant, as far as I can see. 

Link to comment
Share on other sites

6 minutes ago, Banned User said:

And they create the algorithm, so they decide essentially 

I think you may be too focused on the word algorithm. I could create some calculations now just using Excel formulas and call it an algorithm to make it sound clever but it really isn't.

The calculations are simply statistical probabilities based as I say on massive historic data sets of what actually happens.

  • Like 2
Link to comment
Share on other sites

4 minutes ago, Bar BS3 said:

Why though..? 

I honestly don't get it..! 

It's not even a factual analysis, like possession. It's utterly pointless & irrelevant, as far as I can see. 

The facts here are the data sets of actual games, about as factual as you can get. 

But if it's pointless we had better let every major and not so major club in the world know they can sack their analysts. 🙂

  • Like 3
Link to comment
Share on other sites

21 minutes ago, chinapig said:

I think you may be too focused on the word algorithm. I could create some calculations now just using Excel formulas and call it an algorithm to make it sound clever but it really isn't.

The calculations are simply statistical probabilities based as I say on massive historic data sets of what actually happens.

They’re really not, I’m a computer scientist myself(ish), their code isn’t even open source which I find odd.

For 1 single goal there are hundreds of variables, let alone say 10,000 goals for them to try to compute what is universally expected.

There’s no science with a hard S, so to say they know exactly what a xG is.. just isn’t that believable.

I’d be interested to see how they work it out.

Edited by Banned User
Link to comment
Share on other sites

3 minutes ago, Banned User said:

They’re really not, I’m a computer scientist myself, their code isn’t even open source which I find odd.

For 1 single goal there are hundreds of variables, let alone say 10,000 goals for them to try to compute what is universally expected.

There’s no science with a hard S, so to say they know exactly what a xG is.. just isn’t that believable.

I’d be interested to see how they work it out.

So we're coming from different perspectives, you as a computer scientist me as an analyst.

Calculation of statistical probability based on so called big data is common across all sorts of businesses, sciences and public bodies. Although some of those who do it like to create a mystique to big themselves up it really isn't rocket science. Nor does it require any particularly sophisticated software.

I would find it rather a dull thing to analyse but it would be easy enough to do in say Python or R. Though I wouldn't want to stump up for the data!

Possibly they don't publish their code for commercial reasons though I see that as a lame excuse personally. But as the saying goes, big data is the new oil!

  • Like 1
Link to comment
Share on other sites

1 hour ago, Banned User said:

xG can only be as accurate as the data guys that create the algorithm, who are they to say what should be a goal or not

They don't say what should and shouldn't be a goal, I think you're misunderstanding.

They build a probabilistic model based off data which then gives the chance of it being a goal - they're not "saying" anything. You could argue they're selecting which statistics are important or not, but that's different from them writing something that says it's a goal or not.

27 minutes ago, Banned User said:

They’re really not, I’m a computer scientist myself(ish), their code isn’t even open source which I find odd.

For 1 single goal there are hundreds of variables, let alone say 10,000 goals for them to try to compute what is universally expected.

There’s no science with a hard S, so to say they know exactly what a xG is.. just isn’t that believable.

I’d be interested to see how they work it out.

There are many, many papers of xG across a wide variety of sports if you're interested. Generally reading someone's code isn't a great way of working out what it does and why.

If you're expecting it to always be right and know "exactly what an xG is" you're approaching it from the wrong angle. It's a tool to determine probability, that's all.

Link to comment
Share on other sites

8 minutes ago, chinapig said:

So we're coming from different perspectives, you as a computer scientist me as an analyst.

Calculation of statistical probability based on so called big data is common across all sorts of businesses, sciences and public bodies. Although some of those who do it like to create a mystique to big themselves up it really isn't rocket science. Nor does it require any particularly sophisticated software.

I would find it rather a dull thing to analyse but it would be easy enough to do in say Python or R. Though I wouldn't want to stump up for the data!

Possibly they don't publish their code for commercial reasons though I see that as a lame excuse personally. But as the saying goes, big data is the new oil!

I should add that I'm with you on variables. Which is why I stress to clients that statistical analysis is an aid to decision making not a magic bullet. You cannot remove the need for expert knowledge and judgement, however much the client wants the analysis to make their decision.

As the saying goes:

Data is not information

Information is not knowledge

Knowledge is not wisdom.

Link to comment
Share on other sites

I have already detailed my issues with xG.

It's a population measurement tool that works well with large samples but has little relevance to a single shot taken by an individual who needs to beat an individual keeper.

If you look at life expectancy an actuary working in an insurance company can work out to a high degree of accuracy what the life expectancy is for a sample of 1000 60 year old men.  If they're too low then their premiums will be too high to be competitive and if they're are too high they will lose a fortune.  No matter what they predict the chances of an individual dying on the actual appointed day are vanishingly small.

Edited by Hxj
  • Like 1
Link to comment
Share on other sites

  

22 minutes ago, Maltshoveller said:

All a total waste of time

xG or what ever they are called count for duck all

REAL goals are the ONLY goals that matter

I don't agree. As we've seen from last season for example, performances and results can vary and the more ways a team has to help measure that can help point to ways they can then improve.

xG is simply another analysis tool to help interpret a performance and result. It shouldn't be used in isolation in my opinion, or seen to be "right" or "wrong" on an individual basis.

Link to comment
Share on other sites

31 minutes ago, chinapig said:

I should add that I'm with you on variables. Which is why I stress to clients that statistical analysis is an aid to decision making not a magic bullet. You cannot remove the need for expert knowledge and judgement, however much the client wants the analysis to make their decision.

You're both right that there are a huge number of variables, but from what I've read before data indicates the impact they have on a shot being a goal or not drops off quite quickly as you'd expect. Just because there are 10,000 variables doesn't mean you need to use even 100 to get a very accurate model.

I think people looking at some criticism would be surprised how much data they take into account. Statsbomb for example use things like the pass height into the attacker, and the height of the ball when subsequently struck by them - and those aren't just blindly applied, they're all weighted depending on the other factors as well I believe.

edit: If people want some interesting(!) links:

https://www.fantasyfootballfix.com/blog-index/how-we-calculate-expected-goals-xg/

https://statsbomb.com/2018/05/the-dual-life-of-expected-goals-part-2/

https://www.thesignificantgame.com/portfolio/do-naive-xg-models-underestimate-expected-goals-for-top-teams/

https://theanalyst.com/eu/2021/06/what-are-expected-goals-xg/

And as @Banned User was asking about code, here's a decent example of how you can get some way towards creating your own xG model: https://www.datofutbol.cl/xg-model/ (the associated code is here I believe if that's your thing: https://github.com/Dato-Futbol/xg-model)

Edited by IAmNick
  • Like 1
Link to comment
Share on other sites

2 minutes ago, IAmNick said:

You're both right that there are a huge number of variables, but from what I've read before data indicates the impact they have on a shot being a goal or not drops off quite quickly as you'd expect. Just because there are 10,000 variables doesn't mean you need to use even 100 to get a very accurate model.

I think people looking at some criticism would be surprised how much data they take into account. Statsbomb for example use things like the pass height into the attacker, and the height of the ball when subsequently struck by them - and those aren't just blindly applied, they're all weighted depending on the other factors as well I believe.

 

True that it might not be too difficult to eliminate variables that have little or no impact. My point was more that you need to be transparent about what you have done, how and why. And, importantly, what you cannot do.

Then make sure the client understands the strengths and weaknesses of your analysis.

As I'm on a roll with cliches, all models are wrong but some are useful!*

*For the sceptics this just means you cannot eliminate uncertainty but you can reduce and quantify it.

  • Like 2
Link to comment
Share on other sites

4 hours ago, 1960maaan said:

 

I've read a few bits on XG etc, I think I'm right in saying the current system doesn't take onto account defenders positioning. I have a vague recollection of someone saying they were looking into adding something, I imagine that would be a massive task, seeing that each chance it's it's own entity. 

I think they are like any stats, used wisely they can give you a guide. Possession was thrown up a lot previously, but some teams don't play a possession based game and are happy to let the other team keep the ball for spells. I remember Brum won one game recently, 30% possession and won 5-1 . Same with XG, can be fun to browse to give a different perspective, but when all said and done if you've won 2-1 , you've won 2-1 .

⬇️⬇️⬇️

4 hours ago, Banned User said:

They’re really not, I’m a computer scientist myself(ish), their code isn’t even open source which I find odd.

For 1 single goal there are hundreds of variables, let alone say 10,000 goals for them to try to compute what is universally expected.

There’s no science with a hard S, so to say they know exactly what a xG is.. just isn’t that believable.

I’d be interested to see how they work it out.

Different companies use different data variables to others.  Some are more sophisticated in what variables they use.  That can cause differences.

To give a bad example, Kasey Palmer’s goal from the corner v Swansea in most xG models was <0.01, e.g. very unlikely.  Another company scored it really high (0.33)….because they had very limited number of events that they classed as shots from that spot…and therefore of the 3 they tracked, 1 was a goal.

3 hours ago, IAmNick said:

They don't say what should and shouldn't be a goal, I think you're misunderstanding.

They build a probabilistic model based off data which then gives the chance of it being a goal - they're not "saying" anything. You could argue they're selecting which statistics are important or not, but that's different from them writing something that says it's a goal or not.

There are many, many papers of xG across a wide variety of sports if you're interested. Generally reading someone's code isn't a great way of working out what it does and why.

If you're expecting it to always be right and know "exactly what an xG is" you're approaching it from the wrong angle. It's a tool to determine probability, that's all.

Correct.

The best example is a penalty.  Most xG models use 0.76, e.g. of 100 penalties in a sample, 76 end up as goals.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

5 hours ago, Bar BS3 said:

Why though..? 

I honestly don't get it..! 

It's not even a factual analysis, like possession. It's utterly pointless & irrelevant, as far as I can see. 

Thats why it is used by every single professional club? xG reliably predicted our downturn last season. Even when we were at the top at the start, xG made it apparent that we were over-achieving massively. The downturn was inevitable, because the xG statistic highlights a more general picture of the game.

  • Like 2
Link to comment
Share on other sites

8 hours ago, Davefevs said:

The best example is a penalty.  Most xG models use 0.76, e.g. of 100 penalties in a sample, 76 end up as goals.

Probably the one example that can be used as a sort of baseline. Always from 12 yards, no defenders between shot taker and goal and one goalkeeper to beat. Where as in open play there are almost an infinite number of variables.  That simple fact that there isn't any "Industry standard" shows there must be several ways to approach it.

Good , interesting fun though.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...