A new R package for simulating volleyball matches

Announcing a new addition to the project: an open-source R package for simulating volleyball matches.

It provides functions for simulating sets and matches of volleyball. Coaches and analysts might find it useful for planning training sessions or match strategies, exploring potential roster changes, or general interest.

The simulation model can be parameterized in two ways:

A simple “sideout” parameterization. We simply specify the sideout rate for each team. This can be a constant value (i.e. their average sideout rate) or can be provided as a function, allowing the sideout rate to vary depending on whatever factors you think might be appropriate. (Function-based parameterization is experimental: see help("vs_simulate_set") for more information.)
A more detailed “phase” parameterization. In this model, we specify the rates for each team for:
- serve_ace (serve ace rate)
- serve_error (serve error rate)
- rec_set_error (error rate on reception-phase sets)
- rec_att_error (error rate on reception-phase attacks)
- rec_att_kill (kill rate on reception-phase attacks)
- rec_block (block kill rate against reception-phase attacks)
- trans_set_error, trans_att_error, trans_att_kill, trans_block - as for the rec_* parameters, but in transition phase (i.e. everything after the reception-phase attack)

R users can install it from GitHub:

## install.packages("remotes") ## if not already installed
remotes::install_github("openvolley/volleysim")

Some examples, taken from the project’s documentation:

Example 1

library(volleysim)
library(datavolley)
library(dplyr)

## read an example file
x <- dv_read(dv_example_file())

## calculate the rates we need to simulate
rates <- list(vs_estimate_rates(x, target_team = home_team(x)),
              vs_estimate_rates(x, target_team = visiting_team(x)))

vs_simulate_match(rates, simple = TRUE)

## $pwin
## [1] 0.988752
## 
## $scores
## $scores$`3-0`
## [1] 0.7399898
## 
## $scores$`3-1`
## [1] 0.2120071
## 
## $scores$`3-2`
## [1] 0.03675516
## 
## $scores$`2-3`
## [1] 0.008013609
## 
## $scores$`1-3`
## [1] 0.002363415
## 
## $scores$`0-3`
## [1] 0.0008709839

So given the performances of the two teams during that match, we expect that the home team should have won, with 3-0 being the most likely scoreline. Compare this to the actual match result:

summary(x)

## Match summary:
## Date: 2015-01-25
## League: Finale mladinke
## Teams: Braslovče (JERONČIČ ZORAN/MIHALINEC DAMIJANA)
##        vs
##        Nova KBM Branik (HAFNER MATJAŽ)
## Result: 3-0 (25-16, 25-14, 25-22)
## Duration: 67 minutes

Example 2: exploring match options

Let’s say we have two teams with the following season-average parameters:

library(dplyr)
rates <- tribble(~team, ~serve_ace, ~serve_error, ~rec_set_error, ~rec_att_error, ~rec_att_kill, ~trans_set_error, ~trans_att_error, ~trans_att_kill, ~sideout, ~rec_block, ~trans_block,
"My Team",    0.062, 0.156, 0.009, 0.071, 0.499, 0.018, 0.082, 0.452, 0.668, 0.075, 0.079,
"Other Team", 0.069, 0.190, 0.014, 0.063, 0.523, 0.021, 0.102, 0.435, 0.683, 0.083, 0.109)
knitr::kable(rates)

team	serve_ace	serve_error	rec_set_error	rec_att_error	rec_att_kill	trans_set_error	trans_att_error	trans_att_kill	sideout	rec_block	trans_block
My Team	0.062	0.156	0.009	0.071	0.499	0.018	0.082	0.452	0.668	0.075	0.079
Other Team	0.069	0.190	0.014	0.063	0.523	0.021	0.102	0.435	0.683	0.083	0.109

“My Team” is due to play “Other Team” in our next match. If we assume that both teams play to their season-average parameters, what outcome do we expect?

vs_simulate_match(rates, simple = TRUE)

## $pwin
## [1] 0.4866795
## 
## $scores
## $scores$`3-0`
## [1] 0.1201833
## 
## $scores$`3-1`
## [1] 0.1826209
## 
## $scores$`3-2`
## [1] 0.1838752
## 
## $scores$`2-3`
## [1] 0.1909978
## 
## $scores$`1-3`
## [1] 0.192379
## 
## $scores$`0-3`
## [1] 0.1299437

Looks like we expect a close match, but My Team is most probably going to lose.

Perhaps as the coach of My Team there are some adjustments I could make — in choosing the players in our starting lineup, or in our match tactics. Can simulation help us explore which option might be most beneficial?

Let’s say that one of our options is to substitute in a different pass-hitter: Betty, who has a more aggressive serve and attack but who is a weaker passer than our starting pass-hitter Agnes. With Betty in the lineup instead of Agnes, we guesstimate our new team parameters to be:

rates2 <- rates

## increase my team's serve aces and errors by 1% each, and attack kills by 2%
rates2[1, c("serve_ace", "serve_error", "rec_att_kill", "trans_att_kill")] <-
    rates2[1, c("serve_ace", "serve_error", "rec_att_kill", "trans_att_kill")] + c(0.01, 0.01, 0.02, 0.02)

## increase opposition serve aces by 1%
rates2[2, c("serve_ace")] <- rates2[2, c("serve_ace")] + 0.01

vs_simulate_match(rates2, simple = TRUE)

## $pwin
## [1] 0.5233379
## 
## $scores
## $scores$`3-0`
## [1] 0.1334328
## 
## $scores$`3-1`
## [1] 0.195746
## 
## $scores$`3-2`
## [1] 0.1941591
## 
## $scores$`2-3`
## [1] 0.180478
## 
## $scores$`1-3`
## [1] 0.1792539
## 
## $scores$`0-3`
## [1] 0.1169302

That makes a slight improvement. Our second option is to change our serving tactics: we will serve more aggressively in order to put more pressure on Other Team’s reception but accepting that we will make more serve errors in doing so:

rates3 <- rates

## increase my team's serve aces by 2% and errors by 5%
rates3[1, c("serve_ace", "serve_error")] <- rates3[1, c("serve_ace", "serve_error")] + c(0.02, 0.05)

## decrease opposition reception kills by 5% due to their expected poorer passing
rates3[2, c("rec_att_kill")] <- rates3[2, c("rec_att_kill")] - 0.05
vs_simulate_match(rates3, simple = TRUE)

## $pwin
## [1] 0.565142
## 
## $scores
## $scores$`3-0`
## [1] 0.1557209
## 
## $scores$`3-1`
## [1] 0.2158291
## 
## $scores$`3-2`
## [1] 0.193592
## 
## $scores$`2-3`
## [1] 0.1770885
## 
## $scores$`1-3`
## [1] 0.1591584
## 
## $scores$`0-3`
## [1] 0.09861113

This looks like it might be a better option (assuming, of course, that we have estimated the changes in rates correctly).

Example 3

Let’s look at another match: the 2020 Austrian Women’s Volley Cup played between Hartberg and UVC Graz (the dvw file was downloaded from https://www.volleynet.at/dvdownload/information/f-Damen/ and is bundled with the volleysim package). UVC Graz won the match 3-1:

x <- dv_read(vs_example_file())
summary(x)

## Match summary:
## Date: 2020-11-08
## League: Austrian Volley Cup Women 2020/21 - DCup
## Teams: Hartberg w Cup (BEINSEN Birgit/ALMER Katharina)
##        vs
##        UVC Graz w Cup (PACK Matthias/APPEL Martin)
## Result: 1-3 (21-25, 21-25, 25-20, 15-25)
## Duration: 88 minutes

Let’s see what result we expected given the team’s actual performances during the match:

rates <- list(vs_estimate_rates(x, target_team = home_team(x)),
              vs_estimate_rates(x, target_team = visiting_team(x)))
sim_result <- vs_simulate_match(rates = rates, simple = TRUE)
sim_result

## $pwin
## [1] 0.2592628
## 
## $scores
## $scores$`3-0`
## [1] 0.04582808
## 
## $scores$`3-1`
## [1] 0.08828443
## 
## $scores$`3-2`
## [1] 0.1251503
## 
## $scores$`2-3`
## [1] 0.1916859
## 
## $scores$`1-3`
## [1] 0.2842663
## 
## $scores$`0-3`
## [1] 0.2647851

The simulations suggest that Hartberg had a 26% chance of winning, with the most likely scoreline being 1-3 — consistent with the actual result.

Now let’s say that the two teams will be playing again soon, and the Hartberg coach thinks that their first-ball attack could be improved by improving their passing. What difference might we expect in the match outcome for an improvement in this area?

First let’s get a handle on the relevant performance parameters. “Positive” passes here means passes that were rated as perfect or positive.

## extract the play-by-play data
xp <- plays(x)

## identify first-ball attacks
fba <- xp %>% dplyr::filter(skill == "Attack" & phase == "Reception") %>%
    mutate(made_attack = TRUE, fbso = evaluation == "Winning attack") %>%
    dplyr::select(point_id, team, made_attack, fbso)

## join that back to the full play-by-play data
xp <- left_join(xp, fba, by = c("point_id", "team")) %>%
    mutate(fbso = if_else(is.na(fbso), FALSE, fbso),
           made_attack = if_else(is.na(made_attack), FALSE, made_attack))

## and pass quality on each rally
pq <- xp %>% dplyr::filter(skill == "Reception") %>% group_by(point_id) %>%
    dplyr::summarize(pass_quality = case_when(n() == 1 ~ evaluation))
xp <- left_join(xp, pq, by = "point_id") %>%
    mutate(pass_quality = case_when(grepl("Perfect|Positive", pass_quality) ~ "Positive",
                                    grepl("Error", pass_quality) ~ "Error",
                                    TRUE ~ "Other"))

## finally summarize the first-ball attacks by pass quality
fb_summary <- xp %>% dplyr::filter(skill == "Reception" & team == home_team(x)) %>%
    group_by(pass_quality) %>% dplyr::summarize(N = n(),
                                                `Attack %` = mean(made_attack)*100,
                                                `Attack kill %` = mean(fbso[made_attack])*100,
                                                `FBSO%` = mean(fbso)*100)
fb_summary

## # A tibble: 3 x 5
##   pass_quality     N `Attack %` `Attack kill %` `FBSO%`
##   <chr>        <int>      <dbl>           <dbl>   <dbl>
## 1 Error            3        0             NaN       0  
## 2 Other           26       61.5            18.8    11.5
## 3 Positive        49       87.8            41.9    36.7

Hartberg made 49 positive passes (62.8% positive pass rate). They made an actual attack on 87.8% of those positive passes, with a kill rate of 41.9% on those attacks. Their overall first-ball sideout rate on positive passes was 36.7%.

On other passes (excluding pass errors), Hartberg’s first-attack sideout rate was 11.5%, with only 61.5% of those passes leading to an attack, and a kill rate of 18.8% on those attacks.

Let’s say that with some focused training, the Hartberg coach thinks that their positive pass rate can be substantially increased, from 62.8% to 75%. This would change their reception attack kill rate, because more attacks would be made from positive passes. The expected reception attack kill rate would be the weighted average of the kill rates on positive and other passes (where the weights are the relative numbers of positive and other passes).

new_positive_pass_rate <- 0.75
attack_rate_pos <- new_positive_pass_rate * 0.878
attack_rate_other <- (1 - new_positive_pass_rate) * 0.615
new_rec_att_kill <- (attack_rate_pos * 0.419 + ## positive pass rate multiplied by their corresponding kill rate
                     attack_rate_other * 0.188 ## attack rate on other passes * their kill rate
                     ) / (attack_rate_pos + attack_rate_other)
new_rec_att_kill

## [1] 0.3752742

That is, with the hypothesized better passing performance we expect only a modest increase in the overall reception attack kill rate to 37.5% (up from 35.6%).

Armed with that estimate, we can explore what effect that might have on a re-match:

rates[[1]]$rec_att_kill <- new_rec_att_kill
new_sim_result <- vs_simulate_match(rates = rates, simple = TRUE)
new_sim_result

## $pwin
## [1] 0.3038112
## 
## $scores
## $scores$`3-0`
## [1] 0.05728925
## 
## $scores$`3-1`
## [1] 0.1056127
## 
## $scores$`3-2`
## [1] 0.1409092
## 
## $scores$`2-3`
## [1] 0.1957914
## 
## $scores$`1-3`
## [1] 0.268356
## 
## $scores$`0-3`
## [1] 0.2320415

Giving a match win probability of 30.4% (up from 25.9%).

A new R package for simulating volleyball matches

2020-11-14

Example 1

Example 2: exploring match options

Example 3