Computational Statistics, STP 540, Spring 2026

sgd

Basic Course Information

The final project should just be a 5 to 10 page write up of what you did.
Explain to me what you did and what simulations and/or data you used
and what basic issues you explored.
Then show me some nice plots and tables showing your results.
HAVE FUN!!
Don't include code, except as an appendix.
Of course you should use AI to help you code, but
let's agree that you should write the text yourself.
There are a lot of choices to make and AI can't to that for you!!!

Final project due May 6.
Just email the pdf directly to me with all group member names clearly on the first page.

Class time and place

29745   STP 540 Computational Statistics                        McCulloch       Tu Th   12:00 PM - 1:15 PM      1/12/26  - 5/1/26       Tempe WXLR A309

Instructor: Robert McCulloch, robert.mcculloch@asu.edu

TA: Alejandro Vidales, avidales@asu.edu
TA office hours:??

How-to-use-Canvas-Discussions.pdf

Syllabus: Syllabus

Some usefull books: books

Miscellaneous

Note the for scientific computing with C++ there is the gsl (gnu scientfic library).
Also note the Eigen and armadillo linear algebra librairies which are heavily supported in R/rstudio.

A student mentioned this, looks very interesting:
Cuda for Numpy

Random number generation in numpy

What Exactly Is Codon?

Codon is a high-performance Python compiler that translates your Python code into 
native machine code: no runtime overhead, no interpreter, 
no GIL (Global Interpreter Lock) choking your multithreading dreams.

Rob code choices:
Talking to claude about C++/R/python for Rob

R packages by Wickham
pybind 11 and python packages

Where we are and what I should be doing?

where and what

R and Python

Information on R

Information on Python

Suggested Projects

Inference for the parameters of a Gaussian Process
See Murphy chapter 15.1 to 15.2.5. Murphy.
See also Rasmussen-and-Williams.pdf.
See also chapter 5 of the book "Surrogates" by Robert Gramacy.

Learning a single layer neural network
   see section 10.7 Fitting a Neural Network in "An Introduction to Statistical Learning", second edition
    by James, Witten, Hastie, and Tibshirani.
   Simple Chain Rule Gradient Computation for a Single Layer
   Single Layer Neural networks, complete notes from Applied Machine Learning

EM algorithm for a mixture of normals

Some old projects:
Comparing the EM algorithm with the Gibbs sample for uninvariate normal mixtures
Gaussian Processes
Comparing the EM algorithm with Gibbs for univariate mixtures
Monte Carlo EM algorithm
Creating a Single Layer Neural Network From Scratch

Homework

How_to_Submit_Homework_in_Canvas.pdf

Homework 1
Homework can be done in groups.
Due February 11.

Homework 2, Due March 16.
GP solution picture.
hw2 solutions.

Homework 3
logit-funs.R
Due April 3.

Notes

A first look at simple logistic regression

Let's review a basic nonlinear model in statistics: simple logistic regression.
We will write simple code to compute the likelihood.
We will look the idea of vectorization which applies in both R and python.
Later we will go into more details on how the likelihood is optimized.

Simple vectorized summing in python
jupyter notebook version

Basic notes on Logistic regression:
Simple Logistic Regression Likelihood, script

Simple logit in R and python:
Simple example of logit in R, Rmd
Simple example of logit in Python, notebook
The default data is available at:
ISLR-Default.csv

Scripts to compute the log-likehood:
R code: Logit Example in R
html version of logit likelihood example
Python code: Logit Example in Python
Simple Logistic Regression Likelihood, html
Simple Logistic Regression Likelihood, ipynb

Plot logit likelihood using color palettes (e.g. viridis) in R

Advanced R, Wickham, Section 24.5.
".. vectorization means finding the existing R function that is implemented in C
and most closely applied to your problem."

Just C

Of course, if you code directly in a lower level language like C++ you get the speed:

Files to compare pure C++ with vectorized R: in-cpp.zip

Calling C++ out of R with a Makefile

Calling C++ out of R using, Rcpp, a Makefile, and SHLIB

Calling C++ out of R with rstudio

Calling C++ out of R using Rcpp using rstudio

More detail on Rcpp in rstudio:
step to make R package: steps.txt
R script to test: do.R
output from do.R: output-from-do.txt

Cython

Learning to cython mLL from claude
time cython
zip file with all the files needed for the installed version

Matrix Decompositions in Statistics

Quick Review of Some Keys Ideas in Linear Algebra
   Simple python script to compare sklearn.Linear regression with (X'X)^{-1} X'y
   Simple R script to compare lm with (X'X)^{-1} X'y

The Multivariate Normal and the Choleski and Eigen Decompositions
   Look at cholesky and spectral in R

Singular Value Decomposition

simple example of svd in python

do_image-svd-approx.py
image approximation with SVD in R, thanks to Andrew Ritchey.

A Deep Dive Into How R Fits a Linear Model

Optimization

Optimization

See chapters 4 and 8 of "Deep Learning" by Goodfellow, Bengio, and Courville.

Simple notes on single layer neural net: Single Layer

Section 3 recording, logit derivatives: recording
Section 4 recording, Taylor's Theorem: recording
Sections 8 and 9 recording, Momentum and Newton's method: recording

An Overview of Gradient Descent Algorithms

Mixture Models and the EM Algorithm

The EM Algorithm
   See Chapter 11 of Murphy.
   See Chapter 4 of Givens and Hoeting.
   See Chapter 8.5 of Hastie, Tibshirani, and Friedman.

Recordings:
The General EM algorithm and mixtures of multivariate normals
More on the EM algorithm
The EM algorithm and missing values

The Bootstrap

(Efron and Hastie, chapters 10 and 11)
The Bootstrap

Thompson Sampling

Tutorial on Thompson Sampling

BART

Introduction to BART
Bayesian Additive Regression Trees, Computational Approaches
chapter in Computational Statistics in Data Science

Short course on BART given at BYU, June 2023