Business Statistics 41000, Summer 2023

General Information

Instructor:

Robert McCulloch
email: Robert.McCulloch@chicagobooth.edu

TA:

Percy Zhai, email: percy.zhai@chicagobooth.edu


Syllabus:
Syllabus, 41000-81/85, Summer 2023


Output for tests

training.pdf

IQ histogram and sequence plot
scatter plots of (x1,y1), (x2,y2)



Important dates:

Official Booth Calendar

So, we have 9 weeks of classes starting Friday/Saturday, June 16/17 until Friday/Saturday August 11/12.
No classes midterm week July 21/22.

Tests will be available in canvas over a range of days.
You can start a test any time in the range, but once you start, you must finish in a fixed amount of time.


midterm: July 21/22.
final: August 18/19.

quizzes:
quiz1: Tuesday - Thursday after week 2, June 27 - June 29.
quiz 2: Tuesday - Thursday after week 4, July 11 - 13.
quiz 3: Tuesday - Thursday after week 8, August 8-10.


Where are we and what should I be doing?

where and what


Notes

Section 1

Notes
Introduction, Probability Concepts and Decisions

Videos
   Discrete Random Variables
   Conditional Joint and Marginal Distributions
   Two-way Tables, Conditionals from Joints, Independence, IID
   Bayes Theorem
   More than two variables, (22 min)
   Making Decisions (14 min)
   Mean and Variance of a Random Variable (12 min)
   Covariance and Correlation (23 min)
   Linear Combinations of Two Random Variables (16 min)
   Linear Combinations of Several Random Variables (21 min)
   Continuous Random Variables and the Normal Distribution (22 min)
   More on the Normal, the Uniform (20 min)


R scripts
   Random_Sampling.R
   mean-and-variance.R    Normal-distribution-in-R.R    simple-data-analysis.R.


A nice essay an the importance of thinking about noise:
   Bias Is a Big Problem. But So Is ‘Noise.’


Section 2

Notes:
Learning from Data: Estimation, Confidence Intervals, and Testing Hypotheses

Videos:
   IID Normal (20 min)
   Confidence Interval and Standard Error for a Normal Mean, part 1, 12 minutes
   Confidence Interval and Standard Error for a Normal Mean, part 2, 20 minutes
   Confidence Interval and Standard Error for a Bernoulli p, 14 minutes
   Hypothesis Tests, part 1, 10 minutes
   Hypothesis Tests, part 2, 13 minutes
   p values, 15 minutes


R scripts
   iid-normal


Section 3

Notes:
Simple Linear Regression

Videos:
   The Simple Linear Regression Model
   Simple Linear Regression Model: Estimates and Plug-in Prediction
   Confidence Intervals, Predictive Intervals, and Hypothesis Tests, 25 minutes
   Correlations and Regressions, Portfolios, 13 minutes


R scripts
   Simple Linear Regression in R



Section 4

Notes
Multiple Regression

Videos:
   Multiple Regression, Modele and Estimates (15 Minutes)
   Confidence Intervals and Hypothesis Tests (7 Minutes)
   Fits, Resids, and R-squared (14 Minutes)
   Dummy for a Binary Categorical Variable (12 Minutes)
   Dummies for Multi Level Categorical Variables (19 Minutes)


R scripts
   Simple Multiple Regression in R    Multiple Linear Regression in R
   plot-midcity-with-N-and-B


Excel
   Video of multiple regression in excel with a dummy variable
   spreadsheet from the video



Section 5

Notes
Topics in Regression

Videos:
   Understanding Multiple Regression, sales and prices (9 Minutes)
   Understanding Multiple Regression, beer data (18 Minutes)
   Regression Model Assumptions (9 Minutes)
   Resdidual Plots
   Nonlinearity
   Interactions
   The Log, standardized residuals, outliers (10 Minutes)
   trees1
   trees2


R scripts
   Trees in R


Searching for Dusty Corners    Finance research using trees

Neural Net fit to Used Cars data
  x=mileage, y=price, with neural nets and keras
  python script to do x=mileage, y=price with neural nets and keras


Naive Bayes

Naive Bayes


Hotels

Hotels Example


Homework

Section 1 Homework

Section 2 Homework

Section 3 Homework

Section 4 Homework

Section 5 Homework




Old Tests

Summer 2023, midterm, solutions    2023 midterm

Summer 2023, Quiz 1, Solutions ,    Quiz1 2023, Problems

Summer 2023, Quiz 2, Solutions ,    Quiz2 2023, Problems

Summer 2023, Quiz3, Solutons     Quiz3 2023, Problems

Summer 2022, Quiz 1

Summer 2022, Quiz 2

Summer 2022, Quiz 3 problems, Info for summer 22 , quiz 3
Summer 2022, Quiz 3, solutions

Summer 2020, midterm    solutions

07 Final      07 Final, Solutions    Diagram for Problem 10

12 Final      12 Final, Solutions      12 Final, Solutions, Handwritten      12 Final, Solutions, Handwritten

2017 Midterm    2017 Midterm, Solutions

2016 Midterm    2016 Midterm, Solutions

2015, Quiz 1    solution
2015, Quiz 2    solution
2015, Quiz 3    solution

2013, Quiz 1    solution
2013, Quiz 2    solution
2013, Quiz 3    solution



Data

Data used in the notes and homeworks is available at: link to data

Typically the data is a csv file which can be read directly into R using something like (for the file Housedata.csv):
> hdat = read.csv("http://www.rob-mcculloch.org/data/Housedata.csv")
> hdat
   Size Price
1   0.8    70
2   0.9    83
3   1.0    74
4   1.1    93
5   1.4    89
6   1.4    58
7   1.5    85
8   1.6   114
9   1.8    95
10  2.0   100
11  2.4   138
12  2.5   111
13  2.7   124
14  3.2   161
15  3.5   172

Excel

Introduction to statistics in Excel

Simple Regression and Scatter plot in Excel

Mean and Variance of discrete in R and excel

Plot a normal pdf in excel

Video of multiple regression in excel with a dummy variable
spreadsheet from the video



Learn R

Installing R

Most people will want to install R and Rstudio.
R is the basic software and Rstudio is a graphical interface to R with many useful add-on tools.

Here is the Rstudio install page: Rstudio/R install
Here is another page from the Rstudio "Hands of Programming with R": Hands on Programming with R

This site https://swirlstats.com/students.html also tells you how to install R and Rstudio.
It has links to simple videos.
swirl is a very nice site for learning R, but swirl teaches you more about R than you need for Business stats.

On the mac I found this youtube video helpful: install R on mac.
It points out that you need a different file for the M1 chip vs the intel chip.

First install R, then install Rstudio.

To install R:
To install Rstudio

Notes on R

Simple Data Analysis in R, plot and Simple Linear Regression

Simple data analysis, pdf
Simple data analysis, html
Simple data analysis, Rmd
simple-data-analysis.R.


A First Look at R

A first look at R
4 videos going through "A first look at R"
   video 1,   video 2,   video 3,   video 4,  

Data in R, Vectors, Lists, and Data Frames

R and Data, pdf
R and Data, html
R and Data, Rmd


Simple Multiple Regression in R

Simple Multiple Regression in R

Rob's "Hello world in R" from his Machine Learning class: Hellow world data analysis in R

Rob's "Hello world in python" from his Machine Learning class: Hello world data analysis in python


Some R References

In general, the Rstudio help is actually pretty good.
The official Rstudio intro to R is Hands-On Programming with R
Check out /help/Cheat Sheets in RStudio.

As mentioned in the syllabus, a nice simple R intro book is:
R for Data Analysis in easy steps, Mike McGrath (also pretty cheap). R for Data Analysis
A great stats/data science book with lots of R is
Introduction to Data Science: Data Analysis and Prediction Algorithms with R (Chapman & Hall/CRC Data Science Series)
Part of: Chapman & Hall/CRC Data Science (26 books) | by Rafael A. Irizarry | Nov 8, 2019
4.8 out of 5 stars, Introduction to Data Science


Here are two relatively simple R cheat sheets:
   Rcommands.pdf
   r_cheat_sheet.pdf


Rob's simple R pdf

A nice way to learn R is swirl: swirl

And datacamp has a free R intro: Data Camp R intro

More advanced information on R is available at: Rob's R info page

R Markdown

In data science ``dynamic documents'' in which code and math and the output form code are combined have become very popular.

This seems like a nice tutorial from R bloggers: Getting Started with R Markdown — Guide and Cheatsheet