Monday, 25 August 2014

Pre-Match Salivary IgA in Soccer Players From the 2014 World Cup Qualifying Campaign


Purpose: To monitor resting salivary secretory immunoglobulin A (SIgA) levels in international soccer players during the short-term training period that precedes international match play.

Methods: In a repeated measure design, saliva samples were obtained from thirteen outfield soccer players who participated in the training camps preceding 7 games (5 home and 2 away) of the 2014 FIFA World Cup qualifying campaign. Samples were obtained daily for four days preceding each game (and analyzed for SIgA using the IPRO Oral Fluid Collection System) at match day (MD) minus 1 (MD-1), minus 2 (MD-2), minus 3 (MD-3) and minus 4 (MD-4).

Results: SIgA displayed a progressive decline (P = 0.01) during the 4-day training period (MD-4: 365 ± 127 μg.mL-1; MD-3: 348 ± 154 μg.mL-1; MD-2:  290 ± 138 μg.mL-1; MD-1:  256 ± 90 μg.mL-1) such that MD-1 values were significantly lower (P = 0.01) than both MD-4 and MD-3. Ninety-five % confidence intervals for the differences between MD-1 and MD-4 (95% CI = -191 to - 26) and MD-1 and MD-3 (95% CI = -155 to - 28).

Conclusions: Data demonstrate that a short-term soccer training camp in preparation for international competition induces detectable perturbations to mucosal immunity. Future studies should monitor SIgA (as a practical and non-invasive measurement of immunity) alongside internal and external measures of training load in an attempt to strategically individualize training and nutritional strategies that may support optimal preparation for high-level competition.

Sunday, 15 June 2014

Predictive Analytics in Microsoft Excel – Part 1: Correlation & Linear Regression

Before we start, you can find a brief introduction to this predictive analytics article series by clicking HERE.

Part 1 of the series is correlation & linear regression. When two variables are related to one another they are said to be correlated. A very simple example in a sporting context is player anthropometrics. When plotted on a scatter diagram the height and weight of your squad should correlate: i.e. taller people tend to weigh more.

If you can quantify the strength and direction of a correlation, you can then start to use your knowledge of one variable to predict another. For example, if you know the height of your player you can predict their weight. The process you use to assess the strength of a relationship is called correlation analysis; however the process you use to predict one variable from another is called regression analysis. The strength of your correlation analysis will determine the accuracy of your regression analysis.

To demonstrate correlation analysis and linear regression I will use something a little more interesting than height and weight. The majority of top-level teams will be using technology to monitor their players in training (GPS) and match play (Prozone). If your aim is to collate all data into one database, data taken from different sources needs to be accurate and comparable. We know from research (Harley et al., 2010) that caution must be taken when using these systems interchangeably, as GPS reported higher values for total distance covered (TD) during match play when compared to Prozone. One possible solution to allow the use of both data sets in one database is to create a linear regression model to predict GPS TD from Prozone match data.

In order to do this we need to evaluate the strength of the relationship (correlation) between GPS and Prozone TD data. It is important to firstly visualize the relationship using a scatter diagram. Below is a screenshot of what this should look like in Excel for Mac 2011. Please note this is simulated data, generated for the purpose of this article and does not accurately represent the true relationship between GPS and Prozone data.

Figure 1.1 Visualisation of the correlation between GPS and Prozone total distance match data.

When visualizing the data like this it is clear to see a positive, nearly perfect correlation (which is to be expected). In order to objectify this relationship we can calculate Pearson’s Correlation Coefficient. This can be easily done using Excel as long as your data is accurately input into a list, as shown in Figure 1.1.

To do this use the Excel formula =CORREL()

Figure 1.2 shows the correlation coefficient for our dataset is 0.98, this can be categorized as nearly perfect (for more information on categorizing the strength of your correlation see Will Hopkins – A New View of Statistics). To calculate this we used the formula =CORREL(A2:A31,B2:B31).

Figure 1.2 Correlation between GPD and Prozone data using =CORREL() function

So how do we get from a correlation to a prediction? In simple terms, when conducting a regression you forecast one variable from another. The equation is as follows:

The regression coefficient, b is obtained as follows:
  1. Find rxy, the correlation between the predictor variable x and the forecast variable y. 
  2. Multiple rxy by the standard deviation of the forecast variable y (Sy).
  3. Divide the result by the standard deviation of the predictor variable x (Sx).

Besides the regression coefficient you need to know the intercept, denoted as a, which you can calculate as follows:
  1. Multiple the regression coefficient b, by the mean of x.
  2. Subtract the result of step 1 from the mean of y.

So the equation for the intercept a is as follows:

a = y - bx

Figure 1.3 shows how this all comes together in an Excel worksheet.

Figure 1.3 Calculation of Linear Regression

Having calculated the regression coefficient and the intercept we can now forecast GPS TD based on Prozone TD. For Example:

y = bx + a

GPS TD = Regression Coefficient * Prozone TD + Intercept

GPS TD = E10 * 12000 + E11

This returns 12524. Given the observed relationship between Prozone and GPS total distance data, we can predict that 12000 metres derived from Prozone data would equal 12524 metres in GPS data.

Conducting this analysis manually helps to understand the underlying equations that are used in a linear regression and helps to show how the correlation is related to the regression formula. I would suggest trying to calculate your regression this way at least once to gain an understanding of these calculations. However, this is a waste of time for everyday work. Luckily there is an Excel function that can calculate this for you =LINEST(). Figure 1.4 shows how you can use this function on this data set.

Figure 1.4 Calculation of Linear Regression using =LINEST() function.

The =LINEST() function can be used to bypass all the intermediate calculations shown in Figure 1.3. This function calculates the regression coefficient and intercept directly. They appear in cells D2:E2 of Figure 1.4.

It must be noted the =LINEST() function is an Array Formula this means instead of simply pressing Enter, you hold down Ctrl and the Shift keys as you press Enter.

So in the case of our dataset in Figure 1.4, the sequence of events is as follows:
  1. Select cells D2:E2. 
  2. Type the formula =LINEST(A2:A31,B2:B31)
  3. Hold down Ctrl and Shift, and press Enter.

You should now see the regression coefficient appear in cell D2 and the intercept in E2. If you look in the formula box, you should see the array-entered formula surrounded by curled brackets, as shown in cell F2. Excel will automatically create these brackets when an array formula is entered you do not need to type these.

Just as in Figure 1.3, you can use the regression coefficient and intercept to create a forecast value based on a predictor value. The formula to do that is shown in cell F5.

Using this formula in cell E5, you can enter any Prozone TD data in cell D5 and a Predicted GPS TD will be returned in cell E5.

Excel does make all of this even easier, the =TREND() function puts everything into one step. It calculates the regression coefficient and intercept and applies them to one or more predictor values and displays a forecast value.

=TREND() will be covered in more detail later in this article series. I have not covered it in this section based on the guidance of Conrad Carlsberg in his book (See Introduction for a link). Conrad states, “By looking at the equations and some of the other results that pertain to the equations, you can do a better job of evaluating whether and how you want to use it.

That wraps up Part 1 of the Predictive Analytics article series. Part 2 will cover how to apply the techniques we have learnt to conduct multiple regression analysis, where you can use more than one predictor variable to return a forecast variable.

It would be good to hear how people would use this correlation and linear regression analysis in practice, please get in touch via the comments section or Twitter.

Click to download the Correlation & Linear Regression Workbook.

Friday, 13 June 2014

Predictive Analytics in Microsoft Excel – Introduction

This is a quick introduction to a series of articles that will cover a range of topics and techniques that can be used by soccer practitioners to better understand the data they are collecting on players.

Predictive Analytics: Microsoft Excel is an excellent book by Conrad Carlberg detailing many statistical methods that can be easily performed in Excel or with plug-ins such as Solver. However, this book is very business orientated, I will attempt to demonstrate these statistical methods using Excel with simulated data and look to answer the questions that are generated in a team sport environment.

This series of articles will give background information into various statistical methods such as, when to use them, what data is appropriate and what questions can be answered from these methods. Articles will also include step by step guidelines for how to conduct these statistical methods in Excel. Articles will include: -
  • Linear regression 
  • Multiple regression
  • Forecasting with moving averages 
  • Forecasting a time series: smoothing 
  • Forecasting a time series: regression 
  • Logistic regression 
  • Principal components analysis 
  • Box-Jenkins ARIMA models 
  • Varimax factor rotation

My aim is to publish at least one article per week. However, this is a learning experience for myself as well. As I work my way through the book and create practical examples related to soccer. More complex methods my take longer. Hopefully by the end this will be a great series of articles that can be used by soccer coaches, sport scientists, S&C coaches and physiotherapists to improve our understanding of data and how it can impact performance.

Look out for Part 1: Linear Regression in the coming days.