Symbolic differentiation is what we learned in school, and software like SymPy, Wolfram and Mathematica do this well. But I want to differentiate the following:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
def f(x):
y = 0.
for i in range(100):
y = np.sin(y + x)
return y
x = np.linspace(-2, 2, 100)
plt.plot(x, [f(x_) for x_ in x])
[<matplotlib.lines.Line2D at 0x116b57320>]
Mathematically, it's something like:
$$f(x) = \sin(x + \sin(x + \sin(x + ...\sin(x)))...)$$Good luck differentiating that and getting a nice closed form. If this is not complicated enough for you, feel free to add some if statements.
We can use autograd, an automatical diff package, to compute exact, pointwise derivatives.
from autograd import grad
from autograd import numpy as np
def f(x):
y = 0.
for i in range(100):
y = np.sin(y + x)
return y
grad(f)(1.)
# magic!
-0.26242653107144726
Of course, you can string together these pointwise derivatives into a function:
grad_f = grad(f)
x = np.linspace(-2, 2, 100)
plt.plot(x, [grad_f(x_) for x_ in x])
[<matplotlib.lines.Line2D at 0x119477d30>]
To use this in our optimization routines:
T = (np.random.exponential(size=1000)/1.5) ** 2.3
E = np.random.binomial(1, 0.95, size=1000)
def cumulative_hazard(params, t):
lambda_, rho_ = params
return (t / lambda_) ** rho_
def log_hazard(params, t):
lambda_, rho_ = params
return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_))
def log_likelihood(params, t, e):
return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))
def negative_log_likelihood(params, t, e):
return -log_likelihood(params, t, e)
So grad(negative_log_likelihood) will find the gradient of negative_log_likelihood with respect to the first parameter, params.
grad_negative_log_likelihood = grad(negative_log_likelihood)
print(grad_negative_log_likelihood(np.array([1., 1.]), T, E))
[-199.44295882 2906.3619735 ]
from scipy.optimize import minimize
results = minimize(negative_log_likelihood,
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=grad_negative_log_likelihood,
bounds=((0.00001, None), (0.00001, None)))
print(results)
fun: 243.1558229286153
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([ 0.00101051, -0.00293405])
message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 11
nit: 7
status: 0
success: True
x: array([0.4693 , 0.4273751])
from autograd import value_and_grad
results = minimize(
value_and_grad(negative_log_likelihood),
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=True, # notice this set to True now.
bounds=((0.00001, None), (0.00001, None)))
results
fun: 243.1558229286153
hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
jac: array([ 0.00101051, -0.00293405])
message: b'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 11
nit: 7
status: 0
success: True
x: array([0.4693 , 0.4273751])
Let's continue this analytical-train 🚂 to Part 4!