Symbolic differentiation is what we learned in school, and software like SymPy, Wolfram and Mathematica do this well. But I want to differentiate the following:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
def f(x):
y = 0.
for i in range(100):
y = np.sin(y + x)
return y
x = np.linspace(-2, 2, 250)
plt.plot(x, [f(_) for _ in x]);
Mathematically, it's something like:
$$f(x) = \sin(x + \sin(x + \sin(x + ...\sin(x)))...)$$Good luck differentiating that and getting a nice closed form. If this is not complicated enough for you, feel free to add some if statements.
We can use autograd, an automatical diff package, to compute exact, pointwise derivatives.
from autograd import grad
from autograd import numpy as np
def f(x):
y = 0.
for i in range(100):
# this np. is now from autograd - important!
y = np.sin(y + x)
return y
grad_f = grad(f)
grad_f(1.)
# magic!
At a high level, autograd has a lookup of simple functions and their derivatives, and uses repeated use of the chain rule + calculus rules
Of course, you can string together these pointwise values into a function:
x = np.linspace(-2, 2, 250)
plt.plot(x, [grad_f(x_) for x_ in x])
To use this in our optimization routines:
T = (np.random.exponential(size=1000)/1.5) ** 2.3
E = np.random.binomial(1, 0.95, size=1000)
def cumulative_hazard(params, t):
lambda_, rho_ = params
return (t / lambda_) ** rho_
def log_hazard(params, t):
lambda_, rho_ = params
return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_))
def log_likelihood(params, t, e):
return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))
def negative_log_likelihood(params, t, e):
return -log_likelihood(params, t, e)
So grad(negative_log_likelihood) will find the gradient of negative_log_likelihood with respect to the first parameter, params.
grad_negative_log_likelihood = # what goes here?
print(grad_negative_log_likelihood(np.array([1., 1.]), T, E))
from scipy.optimize import minimize
results = minimize(negative_log_likelihood,
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=grad_negative_log_likelihood,
bounds=((0.00001, None), (0.00001, None)))
print(results)
from autograd import value_and_grad
results = minimize(
# fill this in.
x0 = np.array([1.0, 1.0]),
method=None,
args=(T, E),
jac=True, # notice this set to True now.
bounds=((0.00001, None), (0.00001, None)))
results
Let's continue this analytical-train 🚂 to Part 4!