
Louis Aslett
Privacy and Security in Bayesian Inference
There has been substantial progress in development of statistical methods which are amenable to computation with modern cryptographic techniques, such as homomorphic encryption. This has enabled fitting and/or prediction of models in areas from classification and regression through to genome wide association studies. However, these are techniques devised to address specific models in specific settings, with the broader challenge of an approach to inference for arbitrary models and arbitrary data sets receiving less attention. This talk will discuss work on an approach which enables theoretically arbitrary low dimensional Bayesian models to be fitted fully encrypted, keeping the model and prior secret from data owners and viceversa. There are several illustrative examples, together with a discussion of some initial theoretical results on the behaviour of the new methodology.

Adrien Saumard
Overcrossvalidation
Crossvalidation, possibly V folded (in this case denoted for short VFCV in the following), is a versatile tool for hyperparameter tuning in statistical inference. In particular, it is very popular in the machine learning community. Reasons for this success combine a relatively low computational cost with good efficiency and wide applicability. The rationale behind crossvalidation indeed barely only relies on the assumption that the sample is made of (nearly) independent and identically distributed random variables.
Crossvalidation of the risks of a collection of Mestimators for model selection can be seen through the prism of penalization. It is then quite transparent that, at least for a fixed value of the number of folds V , VFCV is asymptotically suboptimal. It is also legitimate to think that it should be improvable in the nonasymptotic regime.
More precisely, the main drawback of VFCV is that it provides a biased estimate of the ideal penalty. But, very interestingly, debiasing this estimate does not give substantially better performances in practice (actually, it tends to deteriorate the results). This is due to a genuine secondorder effect that gives benefit to a slight overestimation of the ideal penalty. This phenomenon is sometimes called the overpenalization problem in a model selection literature, lacking so far of theoretical understanding.
In this talk, we will first give a precise mathematical description of the over penalization problem, through a formalism involving multiple (pseudo)testing. Then we will propose a possible modification of VFCV and compare its theoretical guarantees with those of the classical VFCV on a nonparametric regression problem, with random design and heteroscedastic noise. At the heart of our analysis and algorithms, we derive and use some concentration inequalities for the excess risks of Mestimators. Such results require to go at a finer scale than (minimax) rates of convergence and are tackled through the use of representation formulas for the excess risks in terms maximzers of local suprema of the underlying empirical process. We will conclude by discussing encouraging experimental results and stating some open problems.
This talk is based on joint works with Amandine Dubois and Fabien Navarro.
