Computational Statistics, STP 540, Spring 2021

-log L sgd sgd

Course Information

Instructor: Robert McCulloch, robert.mcculloch@asu.edu
Office hours (online): Thursdays, 10am
TA: Xiangwei Peng, Xiangwei.Peng@asu.edu

Syllabus

Advanced R, Wickham, Section 24.5.
".. vectorization means finding the existing R function that is implemented in C
and most closely applied to your problem."

Simple Logistic Regression, basic notes: Logistic Regression
Simple Logistic Regression: Computing the likelihood, for simple logistic regression.

R code: Logit Example in R
Python code: Logit Example in Python

C++ code:
cmLL.cpp, the C++ code
Makefile, the Makefile to compile C++ code
mLL.R, the R code (which calls the C++ code)
do.R, the R code to show how to use the C++/R code and compare to simple vectorized R
data used in do.R

Rcpp frequently asked questions
looks like a nice webpage for getting started with Rcpp

Calling C++ out of R using rstudio to create an R package
C++ code: cmll.cpp, cmll.h, test.cpp, Makefile, test.R,
movie showing how it is done in rstudio

Simple Parallel computing in R

example R script
gettingstartedParallel.pdf
more intense documentation

Simple Parallel computing in Python, thanks Alex!!

demo 1
demo 2

Note:
General GPU on ASU's Research Computing Cluster, February 1, 2021 2:00pm - 3:00pm
This workshop will describe different low and high level approaches to accelerating
existing or developing research codes through the use of Graphical Processing Units (GPUs) on the ASU High Performance Computing cluster.

In preparation for the workshop all attendees are encouraged to obtain an account on Agave if they do not already have one.

Register Here.

Quick Review of Some Keys Ideas in Linear Algebra
What Really IS a Matrix Determinant?

QR Matrix Factorization Least Squares and Computation (with R and C++)

The Multivariate Normal and the Choleski and Eigen Decompositions
Look at cholesky and spectral in R

Singular Value Decomposition

simple example of svd in python

Hi Dr. McCulloch,
In class today you were talking about how in R, the lm() function does a QR decomposition under the hood,
and you were wondering how sklearn's LinearRegression object fits the model.
I was also curious about this, so I took a quick look and wanted to share what I learned.
Quick summary is that it's an SVD under the hood.
Longer summary:
sklearn.LinearRegression calls scipy.linalg.lstsq ( here's the line of code where that happens)
scipy.linalg.lstsq in turn calls one of three LAPACK drivers ( see here).
The default driver is gelsd, which according to the MKL LAPACK documentation uses SVD
to solve least squares problems (via Householder transformations, so not entirely unlike the R approach).
So unless you provide sklearn with other constraints (like positive coefficients),
it will by default call a LAPACK routine that solves the least squares problem using SVD.

Best,
Drew

The EM Algorithm
   See Chapter 11 of Murphy.
   See Chapter 4 of Givens and Hoeting.
   See Chapter 8.5 of Hastie, Tibshirani, and Friedman.

(Efron and Hastie, chapters 10 and 11)
The Bootstrap