Read 15.1 to 15.2.5 (inclusive) of “Machine Learning, A Probabalistic Perspective” by Kevin Murphy.
You can find pdf on the internet.
The goal is to infer the parameters of a simple Gaussian Process model.
Our model is \[ y_i = f(x_i) + \epsilon_i \]
To keep things simple let’s assume \(x_i \in R\).
where \(\epsilon_i \sim N(0,\sigma^2_y)\) and \[ (f(x_1),f(x_2),\ldots,f(x_N))' \equiv f(x) \sim N(0,K), \;\; K_{ij} = \kappa(x_i,x_j). \] Then, \[ Y \,|\, x \sim N(0,K_y), \;\; K_y = K + \sigma^2_y \, I. \] Now let \(\kappa\) depend on some parameters so that we have \(\kappa(x_i,x_j;\gamma)\) and hence
\[ K_y = K(\gamma) + \sigma^2_y \, I. \]
The example in Murphy is \[ \kappa(x_i,x_j;\sigma_f,l) = \sigma^2_f \, \exp(-\frac{1}{2l^2} \, (x_i-x_j)^2)). \]
so \(\gamma = (\sigma_f,l)\).
Letting \(\theta = (\gamma,\sigma_y)\) We have \[ K_y(\theta) = K(\gamma) + \sigma^2_y \, I. \]
Then \(Y \, |\, x \sim N(0,K_y(\theta))\) and \[ \log(p(y \, | \, x,\theta)) =- \frac{1}{2} y' K_y(\theta)^{-1} y - \frac{1}{2} \log(|K_y(\theta)|) - \frac{N}{2} \log(2 \pi). \]
which is Equation 15.2.2 of Murphy.
Now the idea of the project is :
For example, fix all of the elements of \(\theta\) but one: \(\theta = (\theta_i,\theta_{-i})\). Fix \(\theta_{-i}\) and then evaluate \[ \log(p(\theta_i \,|\, \theta_{-i},y)) \propto \log(p(y \,|\, x, \theta)) + \log(p(\theta)) \]
on a grid of values for \(\theta_i\). Here \(p(\theta)\) is the prior. You could then use co-ordinate descent or a griddy Gibbs sampler.
Or you could use stochastic gradient descent or ….
Note, Algorithm 15.1 just says that to compute \(\log(p(y \, | \, x,\theta))\) let \(L\) be the Cholesky root of \(K_y(\theta)\).
See page 228 of Murphy for discussion of the “backslash operator” which he uses in 15.1.
This basically says solve \(y = Xb\) for \(b\) by solving \(X'X b = X'y\) using the \(QR\) decomposition of \(X\).
Then use the Cholesky to compute \(|K_y(\theta)|\) and \(K_y(\theta)^{-1} y \equiv \alpha\).
That is, If \(K_y = LL'\), then
\[ \alpha = K_y^{-1} y = (LL')^{-1} y =( L^{-1})' L^{-1} y. \] and \[ \log(|K_y|^{1/2})= \sum \log(L_{ii}). \]
Use real and simulated data to see how your procedure works.