Tutorial
This overarching tutorial describes how to solve an optimization problem with Fides. It further provides performance tips for computationally intensive objective functions.
Input - a Function to Minimize
Fides requires a function to minimize, its gradient and optionally its Hessian. In this tutorial, we use the nonlinear Rosenbrock function:
\[f(x_1, x_2) = (1.0 - x_1)^2 + 100.0(x_2 - x_1^2)^2\]
The objective function is expected to take a vector input and return a scalar:
function f(x)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
Where x
may be either a Vector
or a ComponentVector
from ComponentArrays.jl. Fides also requires a gradient function, and optionally a Hessian function. In this example, for convenience we compute both via automatic differentiation using ForwardDiff.jl:
using ForwardDiff
grad! = (g, x) -> ForwardDiff.gradient!(g, f, x)
hess! = (H, x) -> ForwardDiff.hessian!(H, f, x)
Both the gradient and Hessian functions are expected to be in-place on the form; grad!(g, x)
and hess!(H, x)
.
Optimization with a Hessian Approximation
Given an objective function and its gradient, the optimization is performed in a two-step procedure. First, a FidesProblem
is created.
using Fides
lb = [-2.0, -2.0]
ub = [ 2.0, 2.0]
x0 = [ 2.0, 2.0]
prob = FidesProblem(f, grad!, x0; lb = lb, ub = ub)
FidesProblem with 2 parameters to estimate
Where x0
is the initial guess for parameter estimation, and lb
and ub
are the lower and upper parameter bounds (defaulting to -Inf
and Inf
if unspecified). The problem is then minimized by calling solve
. When the Hessian is unavailable or too expensive to compute, a Hessian approximation is provided during this step:
sol = solve(prob, Fides.BFGS())
FidesSolution
---------------- Summary ---------------
min(f) = 2.16e-13
Parameters estimated = 2
Optimiser iterations = 22
Runtime = 2.2e-02s
Several Hessian approximations are supported (see the API), and BFGS
generally performs well. Additional tuning options can be set by providing a FidesOptions
struct via the options
keyword in solve
, and a full list of available options can be found in the API documentation.
Optimization with a User-Provided Hessian
If the Hessian (or a suitable approximation such as the Gauss–Newton approximation) is available, providing it can improve convergence. To provide a Hessian function to FidesProblem
do:
prob = FidesProblem(f, grad!, x0; hess! = hess!, lb = lb, ub = ub)
Then, when solving the problem use the Fides.CustomHessian()
Hessian option:
sol = solve(prob, Fides.CustomHessian())
FidesSolution
---------------- Summary ---------------
min(f) = 3.59e-22
Parameters estimated = 2
Optimiser iterations = 17
Runtime = 1.3e-02s
Performance tip: Computing Derivatives and Objective Simultaneously
Internally, the objective function and its derivatives are computed simultaneously by Fides. Hence, runtime can be reduced if is is possible to reuse intermediate quantities between the objective and derivative computations. To take advantage of this, a FidesProblem
can be created with a function that computes the objective and gradient (and optionally the Hessian) for a given input. For example, when only the gradient is available:
function fides_obj(x)
obj = f(x)
g = ForwardDiff.gradient(f, x)
return (obj, g)
end
prob = FidesProblem(fides_obj, x0; lb = lb, ub = ub)
sol = solve(prob, Fides.BFGS())
FidesSolution
---------------- Summary ---------------
min(f) = 2.16e-13
Parameters estimated = 2
Optimiser iterations = 22
Runtime = 2.2e-02s
When a Hessian function is available, do:
function fides_obj(x)
obj = f(x)
g = ForwardDiff.gradient(f, x)
H = ForwardDiff.hessian(f, x)
return (obj, g, H)
end
prob = FidesProblem(fides_obj, x0; lb = lb, ub = ub)
sol = solve(prob, Fides.CustomHessian())
FidesSolution
---------------- Summary ---------------
min(f) = 3.59e-22
Parameters estimated = 2
Optimiser iterations = 17
Runtime = 1.3e-02s
In this simple example, no runtime benefit is obtained as no quantities are reused between objective and derivative computations. However, if quantities can be reused (for example, when gradients are computed for ODE models), runtime can be noticeably reduced.