Course on machine learning

Last week the centre for mathematical sciences and the impact lab organized a week long course on machine learning. Every data from 13:30 to 17:30 there were a mixture of lectures and computer lab sessions. The audience was a mixture of researchers from the University and people from industry.

The following topics were covered

  • Introduction to Big Data
  • Introduction to Neural Networks
  • Gaussian process
  • Machine learning
  • Introduction to Optimisation for AI.

Paper in Journal of Financial Econometrics.

The paper: “The threshold GARCH model: estimation and density forecasting for financial returns” has been accepted in the Journal of Financial Econometrics.

The paper is co-authored by Yuzhi Cai, Department of Accounting and Finance, School of Management, Swansea University, and Julian Stander.
Yuzhi Cai used to work at the University of Plymouth.


The paper presents a new method for modelling and predicting financial returns, and for understanding the uncertainty associated with predictions.  This method is applied to Hang Seng and S&P500 daily closing indices, and outperforms existing prediction techniques.

Paper in Statistics in Medicine.

The paper: Analysis of paediatric visual acuity using Bayesian copula models with sinh-arcsinh marginal densities has been accepted in the journal of Statistics in Medicine

The authors of the paper, from the University of Plymouth , are Julian Stander and Luciana Dalla Valle. The other authors are Charlotte Taglioni, Brunero Liseo, Angie Wade and Mario Cortina-Borja.

Charlotte spent one of her undergraduate years with us in Plymouth under the Erasmus scheme and then returned under Erasmus Plus Traineeship.  After competing
her PhD at the University of Padova, Italy, Charlotte took a post at the Food and Agriculture Organization of the United Nations.

Brunero Liseo is professor at the University of Rome La Sapienza, Italy.  Angie Wade and Mario Cortina-Borja are professors at the Great Ormond Street Institute of Child Health, University College London, UK.

The summary of the paper is:

It is important that children who are visually impaired are identified early enough to start corrective treatment.  Visual acuity, which quantifies each eye’s spatial resolution capacity, was measured in the left and right eyes of over 2700 children, together with each child’s age.  A flexible bivariate statistical model was developed to understand how visual acuity changes with age in each eye separately, and how the dependence between left and right eye visual acuity is related to age.  It was found that as age increases, visual acuity improves and the dependence between visual acuity in the left and right eyes becomes stronger, meaning that children’s eyes become better and more similar with age.  The bivariate statistical model also allowed the identification of children with unusual sight, distinguishing those who are atypical in both eyes when they are considered together from those who are outliers in one or both eyes when they are considered separately, as in current practice.  This yields an innovative tool that enables clinicians to recognise children with unusual sight who may otherwise be missed.  

For any given value of age, the above-mentioned statistical model takes the form of a specially shaped bivariate probability density function which provides a mathematical description of the probability that left and right visual acuity measurements take values in specified ranges.  It is based on a class of bivariate probability density functions known as copulas that are defined using a parameter, related to correlation, which controls the strength of dependence between the variables.  The bivariate copula models the dependence between left and right visual acuity variables by joining in an appropriate way the separate distributions for each variable.  An advantage of this copula-based approach is that it splits marginal (each variable separately) from dependence (two variables together) modelling.  The user has the freedom to choose the copula from a large number of possibilities and to specify suitable univariate probability descriptions, for example normal distributions, for each variable separately.

The normal distribution is defined using two parameters, the mean (location) and standard deviation (spread).  However, it turns out that there are more visual acuity values in the tails and fewer around the centre of the distribution than can be accounted for by this normality assumption.  Therefore, a four-parameter generalization of the normal, the sinh-arcsinh distribution, which allows tails that are heavier (or lighter) than those of the normal, is used.  Seventeen different types of copula functions were considered to join these sinh-arcsinh marginal distributions and a type of goodness-of-fit method to select the copula that performed best was verified.  This yields as the statistical model a specially shaped bivariate probability density function for left and right visual acuity for each value of age that has nine parameters (four each for the left and right marginal distributions, and the copula dependence parameter).  These parameters are assumed to be related to age by means of flexible smooth curves called natural cubic splines.

Statistical inference concerns estimating all the unknown parameters of a statistical model from data and quantifying the uncertainty associated with these estimates.  There are two main approaches to statistical inference referred to as ‘frequentist’ and ‘Bayesian’.  The Bayesian approach is used here for the copula-based model because it allows parameter estimation uncertainty to be fully determined.  Comparisons are also provided with an established and computationally faster frequentist alternative that does not quantify uncertainty fully.