A Comparison of Regularization Methods for Gaussian Processes
Abstract
Gaussian Processes (GPs) are classical probabilistic models to represent the results of experiments on grids of points. They have numerous applications, in particular in nonlinear global optimization when the experiments (typically PDE simulations) are costly. GPs require the inversion of a covariance matrix. There are many situations, in particular optimization, when the density of experiments becomes higher in some regions of the search space, which makes the covariance matrix ill-conditionned, an issue which is handled in general through regularization techniques. Today, the need to better understand and improve regularization remains.
The two most classical regularization methods are i) pseudoinverse (PI) and ii) adding a small positive constant to the main diagonal (which is called nugget regularization). This work provides new algebraic insights into PI and nugget regularizations. It is proven that pseudoinverse regularization averages the output values and makes the variance null at redundant points. On the opposite, nugget regularization lacks interpolation properties but preserves a non-zero variance at every point. However, these two regularization techniques become similar as the nugget value decreases. A new distribution-wise GP is then introduced which interpolates Gaussian distributions instead of data points and mitigates the drawbacks of pseudoinverse and nugget regularized GPs.