Business Statistics 41000, Summer 2025
General Information
Instructor:
Robert McCulloch
email: Robert.McCulloch@chicagobooth.edu
TA:
Percy Zhai, email: percy.zhai@chicagobooth.edu
Syllabus:
Syllabus, 41000-81, Summer 2025
Important dates:
We have 9 weeks starting the week of June 17 (Tuesday).
Tests will be available in canvas over a range of days.
You can start a test any time in the range, but once you start, you must finish in a fixed amount of time.
Test Dates:
Midterm: available as Canvas quiz the week of July 22 (6th week).
Final: available as Canvas quiz exam week.
quizzes:
quiz1: after week 2
quiz 2: week 4.
quiz 3: week 8.
There will be no class the week of the midterm
Where are we and what should I be doing?
where and what
Notes
Section 1
Notes
Introduction, Probability Concepts and Decisions
Videos
Discrete Random Variables
Conditional Joint and Marginal Distributions
Two-way Tables, Conditionals from Joints, Independence, IID
Bayes Theorem
More than two variables, (22 min)
Making Decisions (14 min)
Mean and Variance of a Random Variable (12 min)
Covariance and Correlation (23 min)
Linear Combinations of Two Random Variables (16 min)
Linear Combinations of Several Random Variables (21 min)
Continuous Random Variables and the Normal Distribution (22 min)
More on the Normal, the Uniform (20 min)
R scripts
Random_Sampling.R
mean-and-variance.R
Normal-distribution-in-R.R
simple-data-analysis.R.
A nice essay an the importance of thinking about noise:
Bias Is a Big Problem. But So Is ‘Noise.’
Section 2
Notes:
Learning from Data: Estimation, Confidence Intervals, and Testing Hypotheses
Videos:
IID Normal (20 min)
Confidence Interval and Standard Error for a Normal Mean, part 1, 12 minutes
Confidence Interval and Standard Error for a Normal Mean, part 2, 20 minutes
Confidence Interval and Standard Error for a Bernoulli p, 14 minutes
Hypothesis Tests, part 1, 10 minutes
Hypothesis Tests, part 2, 13 minutes
p values, 15 minutes
R scripts
iid-normal
Ron Wasserstein on ``Beyond p-values'':
Moving to a world beyond p less than 0.05
Beyond p-values, pptx
Beyond p-values, pdf
Section 3
Notes:
Simple Linear Regression
Videos:
The Simple Linear Regression Model
Simple Linear Regression Model: Estimates and Plug-in Prediction
Confidence Intervals, Predictive Intervals, and Hypothesis Tests, 25 minutes
Correlations and Regressions, Portfolios, 13 minutes
R scripts
Simple Linear Regression in R
Section 4
Notes
Multiple Regression
Videos:
Multiple Regression, Modele and Estimates (15 Minutes)
Confidence Intervals and Hypothesis Tests (7 Minutes)
Fits, Resids, and R-squared (14 Minutes)
Dummy for a Binary Categorical Variable (12 Minutes)
Dummies for Multi Level Categorical Variables (19 Minutes)
R scripts
Simple Multiple Regression in R
Multiple Linear Regression in R
plot-midcity-with-N-and-B
3D spin3D spin3D spin
check estimation of sigma in multiple regression
Excel
Video of multiple regression in excel with a dummy variable
spreadsheet from the video
Section 5
Notes
Topics in Regression
Videos:
Understanding Multiple Regression, sales and prices (9 Minutes)
Understanding Multiple Regression, beer data (18 Minutes)
Regression Model Assumptions (9 Minutes)
Resdidual Plots
Nonlinearity
Interactions
The Log, standardized residuals, outliers (10 Minutes)
trees1
trees2
R scripts
Trees in R
Searching for Dusty Corners
Finance research using trees
Neural Net fit to Used Cars data
x=mileage, y=price, with neural nets and keras
python script to do x=mileage, y=price with neural nets and keras
Naive Bayes
Naive Bayes
Hotels
Hotels Example
Homework
Old Tests
Data
Data used in the notes and homeworks is available at:
link to data
Typically the data is a csv file which can be read directly into R using something like
(for the file Housedata.csv):
> hdat = read.csv("http://www.rob-mcculloch.org/data/Housedata.csv")
> hdat
Size Price
1 0.8 70
2 0.9 83
3 1.0 74
4 1.1 93
5 1.4 89
6 1.4 58
7 1.5 85
8 1.6 114
9 1.8 95
10 2.0 100
11 2.4 138
12 2.5 111
13 2.7 124
14 3.2 161
15 3.5 172
Excel
Introduction to statistics in Excel
Simple Regression and Scatter plot in Excel
Mean and Variance of discrete in R and excel
Plot a normal pdf in excel
Video of multiple regression in excel with a dummy variable
spreadsheet from the video
Learn R
Nice R References
Swirl
Hands on Programming with R
Introduction to Data Science, by Rafael A. Irizarry
Introduction to Data Science: Data Wrangling and Visualization with R (Chapman & Hall/CRC Data Science Series) 2nd Edition
by Rafael A. Irizarry (Author)
R for Data Science
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data 2nd Edition
by Hadley Wickham (Author), Mine Çetinkaya-Rundel (Author), Garrett Grolemund (Author)
4.6 4.6 out of 5 stars 72 ratings 4.5 on Goodreads 1,122 ratings
Installing R
Most people will want to install R and Rstudio.
R is the basic software and Rstudio is a graphical interface to R with many useful add-on tools.
Here is the Rstudio install page: Rstudio/R install
Here is another page from the Rstudio "Hands of Programming with R": Hands on Programming with R
This site
https://swirlstats.com/students.html
also tells you how to install R and Rstudio.
It has links to simple videos.
swirl is a very nice site for learning R, but swirl teaches you more about R than you need for Business stats.
On the mac I found this youtube video helpful: install R on mac.
It points out that you need a different file for the M1 chip vs the intel chip.
First install R, then install Rstudio.
To install R:
- go to https://www.r-project.org/ link
- click on CRAN
- pick a mirror (0-cloud, the first one is fine).
- click on Download R for ... (Linux, Windows or Mac)
To install Rstudio
- got to Rstudio
- Download
- desktop free
Notes on R
Simple Data Analysis in R, plot and Simple Linear Regression
Simple data analysis, pdf
Simple data analysis, html
Simple data analysis, Rmd
simple-data-analysis.R.
A First Look at R
A first look at R
4 videos going through "A first look at R"
  video 1,  
video 2,  
video 3,  
video 4,  
Data in R, Vectors, Lists, and Data Frames
R and Data, pdf
R and Data, html
R and Data, Rmd
Simple Multiple Regression in R
Simple Multiple Regression in R
Rob's "Hello world in R" from his Machine Learning class:
Hellow world data analysis in R
Rob's "Hello world in python" from his Machine Learning class:
Hello world data analysis in python
Some R References
In general, the Rstudio help is actually pretty good.
The official Rstudio intro to R is Hands-On Programming with R
Check out /help/Cheat Sheets in RStudio.
As mentioned in the syllabus, a nice simple R intro book is:
R for Data Analysis in easy steps, Mike McGrath (also pretty cheap).
R for Data Analysis
A great stats/data science book with lots of R is
Introduction to Data Science: Data Analysis and Prediction Algorithms with R (Chapman & Hall/CRC Data Science Series)
Part of: Chapman & Hall/CRC Data Science (26 books) | by Rafael A. Irizarry | Nov 8, 2019
4.8 out of 5 stars, Introduction to Data Science
Here are two relatively simple R cheat sheets:
Rcommands.pdf
r_cheat_sheet.pdf
Rob's simple R pdf
A nice way to learn R is swirl: swirl
And datacamp has a free R intro: Data Camp R intro
More advanced information on R is available at:
Rob's R info page
R Markdown
In data science ``dynamic documents'' in which code and math and the output form code are combined
have become very popular.
This seems like a nice tutorial from R bloggers: Getting Started with R Markdown — Guide and Cheatsheet