We often like to think of our model as giving a distribution for potentially obervable \(y\) given parameters \(\theta\).
\[ p(y | \theta) \]
We often also have an addition set of information “\(x\)” we condition on
\[ p(y | x, \theta) \]
A basic example is the Bernoulli model.
\(y=(y_1,y_2,\ldots,y_n)\), with
\[ Y_i \sim \text{Bernoulli}(p), \,\, \text{IID} \]
So here “\(\theta\) is \(p\)”.
An example with conditioning information would be linear regression with normal error.
\[ Y \sim N(X \beta,\sigma^2 I) \]
Here, \(\theta = (\beta,\sigma)\).
The beauty of the Bayesian approach is that it stems from basic probability ideas.
Given a model for observables, \(p(y | \theta)\), we think about our probabalistic beliefs about \(\theta\) and capture them with a prior distribution \(p(\theta)\).
We then have a joint model for \(y\) and \(\theta\)
\[ p(y,\theta) = p(\theta) \, p(y | \theta) \]
Note that we are reusing the symbol \(p\) and its meaning is determined by its arguments.
Given we observe \(y\) our beliefs about \(\theta\) are updated in light of this information by conditioning on \(y\):
\[ p(\theta | y) \propto p(y,\theta) = p(\theta) \, p(y | \theta) \]
The object \(p(y | \theta)\) viewed as a function of \(\theta\) for a fixed \(y\) is called the likelihood function.
\[ L(\theta) \propto p(y | \theta) \]
Giving us the famous equation,
\[ p(\theta | y) \propto p(\theta) \, L(\theta). \]
Let’s suppose the Leafs have score on 26 of their last 100 powerplays and we believe that it is IID Bernoulli whether or not they score.
Given the data and our model, what are our beliefs about \(\theta\) where \(Y_i \sim \text{Bernoulli}(\theta)\) (that is, \(\theta\) is \(p\)).
To specify a prior on \(\theta \in (0,1)\) we use the Beta distribution
\[ \theta \sim Beta(\alpha,\beta) \]
\[ p(\theta | \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} \theta^{\alpha-1} (1-\theta)^{\beta-1}, \;\; \theta \in (0,1). \]
What is the likelihood?
\(y = (y_1,y_2,\ldots, y_n)\), \(y_i \in \{0,1\}\).
\[ p(y | \theta) = \Pi_{i=1}^n \, \theta^{y_i} (1-\theta)^{1-y_i} = \theta^S \, (1-\theta)^{n-S} \]
where \(S = \sum y_i\) the number of ones.
So,
\[ p(\theta | y) \propto [\theta^{\alpha-1} (1-\theta)^{\beta-1}] \, [\theta^S \, (1-\theta)^{n-S}] = \theta^{\alpha + S -1} \, (1-\theta)^{\beta + n-S -1} \]
So,
\[ \theta | y \sim \text{Beta}(\alpha + S, \beta+ (n-S)) \]
For the Leafs data lets use \(\alpha =
2\), \(\beta = 8\).
\(S =26\), \(n-S = 74\)
Let’s plot the prior and posterior
# beta parameters for our prior
apri = 3; bpri = 10
# beta parameters forour posterior
apost = apri + 26
bpost = bpri + 74
# plot prior and posterior at theta values in tvec
tvec = seq(from=.0001,to=.99999,length.out=1000)
plot(tvec,dbeta(tvec,apost,bpost),type='l',col='blue',xlab='prob',ylab="")
lines(tvec,dbeta(tvec,apri,bpri),col='red')
legend('topright',legend=c("prior of theta","posterior of theta"),col=c('red','blue'),lty=c(1,1),bty="n")
The difference between the prior and posterior is a great way to understand the information in the data.
Suppose I want to know what the data tells me about the odd
ratio
\[
\gamma = \frac{\theta}{1-\theta}
\]
We can see our prior and posterior for \(\gamma\) using Monte Carlo.
nd=5000
set.seed(99)
## prior of gamma
tpriD = rbeta(nd,apri,bpri) ## iid draws from the prior of theta = p
gpriD = tpriD/(1-tpriD) ## iid draws from the prior of gamma = odds ratio
## post of gamma
tpostD = rbeta(nd,apost,bpost) ## draws from the posterior of theta = p
gpostD = tpostD/(1-tpostD) ## iid draws from the posterior of gamma = odds ratio
par(mfrow=c(2,2))
hist(tpriD,freq=F,main='prior of theta')
hist(tpostD,freq=F,main='posterior of theta')
lines(tvec,dbeta(tvec,apost,bpost),type='l',col='blue')
hist(gpriD,freq=F,main='prior of gamma')
hist(gpostD,freq=F,main='posterior of gamma')
Let’s plot the prior and posterior of \(\gamma\) using a density smooth of the draws.
dspri = density(gpriD)
dspost = density(gpostD)
plot(dspri,xlim=c(0,1),ylim=range(c(dspri$y,dspost$y)),col='red',main="prior and posterior of gamma")
lines(dspost,col='blue')
legend('topright',legend=c("prior of gamma","posterior of gamma"),col=c('red','blue'),lty=c(1,1),bty="n")