Unconventional regularization for efficient and sustainable machine learning
Classic algorithm design is based on penalizing or imposing explicit constraints to an empirical objective function, which is eventually optimized. In practice, however, a number of different algorithmic solutions are employed. Their effect on final performance is hard to assess a priori and typically done empirically.
In this talk, I will consider a nonparametric linear least squares framework and will take a regularization perspective to understand the effect of two commonly used ideas: sketching and iterative optimization. This analysis will highlight the role and the interplay of different algorithmic choices, including training time, step-size, mini-batch size, and the choice of sketching, among others. Indeed, one can view all these choices as implementing a form of “algorithmic regularization”. The obtained results provide new and practical guidelines for algorithm design. They suggest that optimal statistical accuracy can be achieved while dramatically improving computational efficiency. Theoretical findings will be illustrated in the context of large scale kernel methods, where we developed the first solvers able to scale to millions of training points.
Efficient Approaches to Changepoint Problems with Dependence Across Segments
Changepoint detection is an increasingly important problem across a
range of applications. It is most commonly encountered when analysing
time-series data, where changepoints correspond to points in time where
some feature of the data, for example its mean, changes abruptly. Often
there are important computational constraints when analysing such data,
with the number of data sequences and their lengths meaning that only
very efficient methods for detecting changepoints are practically feasible.
A natural way of estimating the number and location of changepoints is
to minimise a cost that trades-off a measure of fit to the data with the
number of changepoints fitted. There are now some efficient algorithms
that can exactly solve the resulting optimisation problem, but they are
only applicable in situations where there is no dependence of the mean
of the data across segments. Using such methods can lead to a loss of
statistical efficiency in situations where e.g. it is known that the
change in mean must be positive.
This talk will present a new class of efficient algorithms that can
exactly minimise our cost whilst imposing certain constraints on the
relationship of the mean before and after a change. These algorithms
have links to recursions that are seen for discrete-state hidden Markov
Models, and within sequential Monte Carlo. We demonstrate the usefulness
of these algorithms on problems such as detecting spikes in calcium
imaging data. Our algorithm can analyse data of length 100,000 in less
than a second, and has been used by the Allen Brain Institute to analyse
the spike patterns of over 60,000 neurons.
This is joint work with Toby Hocking, Sean Jewell, Guillem Rigaill and
Stein's Method in Computational Statistics
This is a talk on computational Bayesian statistics; in particular, we
aim to show how the judicious use of Stein's method enables generation
of low-discrepancy point sets for posterior approximation, as well as
rate-optimal approximation of posterior integrals of Sobolev functions.