An analysis of covariance parameters in Gaussian Process-based optimization
Abstract
Numerical optimization problems are at the core of many real-world applications in which the function to be optimized stems from a proprietary and computationally intensive simulation software. It is then preferable to handle the problem as a black-box optimization and to approximate the objective function by a surrogate. Among the methods developed for solving such problems, the Efficient Global Optimization (EGO) algorithm is regarded as a state-of-the-art algorithm. The surrogate model used in EGO is a Gaussian Process (GP) conditional on data points where the value of the objective function has already been calculated.The most important control on the efficiency of the EGO algorithm is the Gaussian process covariance function (or kernel) as it customizes the GP that is processed to create the optimization iterates. Traditionally, a parameterized family of covariance functions (e.g., squared exponential, Matérn) is considered whose parameters are often estimated by maximum likelihood. However, the effect of these parameters on the performance of EGO has not been properly studied and needs further investigation.In this paper, we theoretically and empirically analyze the effect of the covariance parameters, the so-called “characteristic length-scale” and “nugget”, on the design of experiments generated by EGO and the associated optimization performance. More precisely, the behavior of EGO algorithms when the covariance parameters are fixed is compared to the standard setting where they are estimated, with a special focus on the case of very small or very large characteristic length-scale. The approach allows a deeper understanding of the influence of these parameters on the EGO iterates and addresses from a mixed practical/theoretical point of view questions that are relevant for EGO users. For instance, our numerical experiments show that choosing a “small” nugget should be preferred to its estimate by maximum likelihood. We prove that iterates stay at the best observed point when the length-scale tends to 0. Vice versa, when the length-scale tends to infinity, we prove that EGO degenerates into a minimization of the GP mean prediction which, itself, tends to the Lagrange interpolation polynomial if the GP kernel is sufficiently differentiable. Overall, this study contributes to a better understanding of a key optimization algorithm, EGO.
Domains
Modeling and SimulationOrigin | Files produced by the author(s) |
---|