Using real (e.g. the cars data with y=price, X=[1,mileage,year]) or simulated data, compute the least squared regression coefficents using the QR decomposition and check that you get the same thing as software such as R (e.g. the lm function) or sklearn (e.g. sklearn.linear_model.LinearRegression).
How long does it take to compute the QR Decompostion?
Vary the dimension of \(X\) and see how the time to compute the QR decompostion of \(X\) depends on the dimension.
Using the cars data with y=price, X=[1,mileage,year]), do the following linear regressions.
How does the coefficient for the residuals in R4 compare to the coefficient for mileage in R2?
How does standard error for the coefficient for the residuals in R4 compare to the coefficient for mileage in R2?
In R2, check that the vector of residuals is orthogonal to each column of [1,mileage,year].
How does the coefficient for mileage from R1 and R2 compare?
Use
p = 5
rho = .8
Sigma = matrix(rep(rho,p^2),ncol=p)
diag(Sigma) = 1.0
Sigma
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0 0.8 0.8 0.8 0.8
## [2,] 0.8 1.0 0.8 0.8 0.8
## [3,] 0.8 0.8 1.0 0.8 0.8
## [4,] 0.8 0.8 0.8 1.0 0.8
## [5,] 0.8 0.8 0.8 0.8 1.0
Let, \[ Y \sim N(0,\Sigma) \]
What is the marginal distribution of \((Y_1,Y_2)\) ?
What is the conditional dist of \((Y_1,Y_2) \,|\, Y_3=.23, Y_4=-.65,Y_5=-.3\) ?
What is \(L\) in \(\Sigma = LL'\), \(L\) lower triangular ?
What is \(L^{-1}\) ?
What is \(A\) in \(A = PD^{1/2}\) ?
Simulate 10,000 observations, iid, \(N(0,\Sigma)\). compute the mle of \(\mu\) and \(\Sigma\), are they close to the true values ?
Simulate 50 observations, iid, \(N(0,\Sigma)\). Compute the mle of \(\mu\) and \(\Sigma\), are they close to the true values ?
What is the mle of \(L\) where \(\Sigma = LL'\), \(L\) lower triangular ?
This code is supposed to evaluate the log likelihood for the iid multivariate normal model.
logL = function(Y,mu,Sigma) {
n = nrow(Y); p=ncol(Y)
dS = det(Sigma)
retval = -.5*n*log(dS)
Si = solve(Sigma)
for(i in 1:n) {
yi = matrix(Y[i,]-mu,ncol=1)
retval = retval - .5 * t(yi) %*% Si %*% yi
}
return(retval)
}
Is the code correct ?
This code will be slow for large \(n\) because of the loop over observations.
“Vectorize” the code by writing everything using matrix operations. Check that your code is faster than the simple code above.
Again simulate data but just use \(p=2\) so that we have 5 parameters (2 in \(\mu\) and 3 in \(\Sigma\)). Plot the log likelihood versus each of the 5 parameters. in \(\mu\) and \(\Sigma\) one at a time keeping the other 4 parameters fixed at the mle values.
You want to see that the max is indeed obtained at the mle.
Note that when you vary an element of \(\hat{\Sigma}\), keeping the others fixed, you need to limit the range so that \(\Sigma\) stays positive definite.
n=100
x = seq(from=-2,to=2,length.out=100)
fx = x^3
plot(x,fx)
ii = 76:85
points(x[ii],fx[ii],col="red",pch=16)
Suppose you did not know the function \(f\). You observe \(f(x_i)\) for \(i\) in \(1:75 \; \cup \; 86:100 = Io\).
What do you know about \(f(x_i)\) for \(i\) in \(76:85 = Iu\) ?
We let \[ F = (f(x_1),f(x_2),...f(x_{100})) \sim N(\mu,\Sigma). \]
We then let \(X = F[Io]\) and \(Y = F[Iu]\) and compute \(Y \;|\; X\).
The key is what are \(\mu\) and \(\Sigma\) ?
In this example, lets suppose we let \(\mu=0\).
\[ \Sigma_{ij} = \sigma^2_f \; \exp(-\frac{1}{(2l^2)}(x_i - x_j)^2) \] See equation 15.10 of “Machine Learning” by Kevin Murphy. This is the “squared exponential kernel”.
Choose reasonable values for \((\sigma_f, l)\) and display Y | X.
How does Y | X compare to Y?