Adrien Ickowicz, Ben Raymond
It is always an interesting exercise to try and predict how performances at youth and junior level translate into senior level. It is also interesting to have an idea of the main features of volleyball that are expected in a given competition, from historical data.
What we are questioning here is not so much the predictive analytics, but rather trying to identify if there is indeed a distinction between the different age group and / or the different competition and continents. And if so, what is it? To assess that, we run a statistical analysis called Principal Component Analysis on a number of competitions and with a number of indicators reflective of the competitions, and see if we can cluster these in groups.
We used as many scouted files from the games as possible, as per Table 1 below. Unfortunately we could not get our hands on any African volleyball data, but we do have coverage for many age groups for South America, North America, Asia, Europe and World competitions, and many games for each category. While we were generally unable to get access every game in every competition, we have only included competitions in which we had a reasonable fraction of games, so as to be confident that the data gave a reasonable reflection of the standard and characteristics of play.
The table describes the different competitions we were able to include in the analysis, the number of teams that participated, and the number of games we accessed for each competitions.
The analysed competitions | |||||||
Summary of games and teams analysed through the shared scouted data. Not all games within the competitions had available scouted files. | |||||||
Age group | Gender | Continent | Year | Games | Teams | ||
---|---|---|---|---|---|---|---|
AVC | AVC Senior Men 2022 | Senior | Men | asia & oceania | 2022 | 20 | 19 |
AVC U16 Men 2023 | U16 | Men | asia & oceania | 2023 | 16 | 11 | |
AVC U18 Men 2018 | U18 | Men | asia & oceania | 2018 | 7 | 9 | |
AVC U20 Men 2022 | U20 | Men | asia & oceania | 2022 | 27 | 17 | |
CEV | CEV Senior Men 2021 | Senior | Men | europe | 2021 | 16 | 20 |
CEV U17 Men 2021 | U17 | Men | europe | 2021 | 17 | 17 | |
CEV U18 Men 2022 | U18 | Men | europe | 2022 | 18 | 16 | |
CEV U20 Men 2022 | U20 | Men | europe | 2022 | 20 | 18 | |
CSV | CSV Senior Men 2021 | Senior | Men | south america | 2021 | 9 | 6 |
CSV U17 Men 2023 | U17 | Men | south america | 2023 | 9 | 6 | |
CSV U19 Men 2022 | U19 | Men | south america | 2022 | 10 | 10 | |
CSV U21 Men 2022 | U21 | Men | south america | 2022 | 8 | 11 | |
FIVB | FIVB U19 Men 2021 | U19 | Men | world | 2021 | 29 | 20 |
FIVB U19 Men 2023 | U19 | Men | world | 2023 | 45 | 20 | |
FIVB U21 Men 2021 | U21 | Men | world | 2021 | 20 | 20 | |
FIVB U21 Men 2023 | U21 | Men | world | 2023 | 31 | 22 | |
FIVB Senior Men 2018 | Senior | Men | world | 2018 | 94 | 25 | |
FIVB Senior Men 2022 | Senior | Men | world | 2022 | 37 | 25 | |
NORCECA | NORCECA Senior Men 2021 | Senior | Men | north america | 2021 | 13 | 9 |
NORCECA Senior Men 2022 | Senior | Men | north america | 2022 | 26 | 13 | |
NORCECA Senior Men 2023 | Senior | Men | north america | 2023 | 17 | 12 | |
NORCECA U19 Men 2023 | U19 | Men | north america | 2023 | 11 | 8 | |
NORCECA U21 Men 2023 | U21 | Men | north america | 2023 | 13 | 9 | |
Source: Courtesy of Lionel Bonnaure and the many scouts worldwide for sharing their scouted files. Sourced from VB Canada’s scout share server. |
The data only goes back to 2018 (two competitions), and the rest from 2020 until 2023. The redundancy of the FIVB competitions (2 U19, 2 U21 and 2 seniors) should also allow us to evaluate the consistency of these competitions in terms of the key statisitical descriptors.
Anyone familiar with volleyball, or sport really, knows that many numbers can be derived from any games. These numbers are then used in a number of ways, Moneyball being one example. At Science Untangled, amongst our suite of apps used to analyse any given game, there is one you can access (there https://apps.untan.gl/dvrr/) which gives you access to range of metrics. The Analysis summary tab in particular provides a number of statistical metrics that we are going to use in this article. The key statistics that we decided to include in the analysis are:
The tab actually provides more than 70 different metrics. We deliberately only focus on the 12 indicators described above to keep it high level. You can check the values for the different competitions in Table 2.
While we are by no means covering every aspect of the game, this should give us a good idea of general tendencies. Further analysis can then be performed following the direction provided by this early approach.
Even though we’ve restricted ourselves to 12 measurements, this is still way too many to visualize directly. Ideally we’d like to summarize this data down to two three dimensions, so that it can be plotted and the differences between the competitions visualized more easily. To do this we use a method called principal components analysis. This is a well-known statistical technique to capture the most important information from high-dimensional data. It relies on the fact that many of the variables will be correlated with each other (i.e. when one increases, so does another). We can combine those correlated variables into new composite variables, called “principal components”, leaving us with a smaller number of variables that still capture the main patterns in the original data. For a technical explanation of how this is done, see e.g. the Wikipedia page.
In the histogram above, we can see the quantity of information provided by the different components is steadily decreasing. The 1st component explains about 43% of the competitions difference, then it goes down to about 24%, then 15%, etc. The components are actually calculated so that they explain different aspects of information, so as you keep many components, you can add the percent of information they cover. The first three components can be considered to be the most significant since they contain almost 81% of the total information of the data. The remaining components each add only a small additional amount of information, so we keep with the first 3 components as a balance between explaining as much as possible and keeping it simple.
So, with keeping 3 components, we need to also understand what these three components represent (remember that each of these components is some combination of our original variables). This is what the figure below helps doing. Within each circle, the original key statistics are represented, and their coordinates mark their relative importance.
Three main pieces of information can be observed from the axis plot below:
The goal of the third visualization is to determine how much each variable is represented in a given component.
With these explanation, here is what can be said from each component:
Having established our principal components, we can project each of our competitions onto these new axes, and examine how they compare to each other. This is shown in the figure below (these are two-dimensional plots, so we plot axes 1 and 2 together in the first plot and axes 1 and 3 in the second plot). The figure is telling us that:
We can also present the same information on a single 3D plot, rather than two 2-D plots. Below is a 3D plot to help navigate the difference in 3 dimensions. Click and drag the plot to rotate it, and scroll to zoom.
This analysis is not the right way to go about evaluating a given team’s performance in a competition, in particular when there is no full round robin where everyone gets a chance to play everyone else. It is however telling us how different the overall standards of play between competitions can be, and how much variability can be expected across competitions. Bear in mind that the key statistics for a team are influenced by what their opponents do, and so making conclusions about the ‘quality’ of one competition over another is not warranted. But one can say what are the aspects of the game that seem to have priority in the different confederations.
The next article will lead us to dig into the within-competition variability.