4.3 Following up on Bernoulli

Suppose \(X\) is binomially distributed:

\[X\sim Bin(n,\theta) \\ f(x|\theta) = {n\choose x} \theta^x (1-\theta)^{n-x}\]

From the warmup, given that \(E(X)=p\), we know that

\[\begin{aligned} I(\theta) & = -E_{\theta} \left[ \frac{d^2 log f(x|\theta)}{d\theta^2}\right] \\ & = \frac{1}{\theta (1-\theta)} \end{aligned}\]

Then \(\pi_J(\theta) = I(\theta)^{\frac{1}{2}} \propto \theta^{\frac{1}{2}}(1-\theta)^{\frac{1}{2}}\), so the Jeffreys prior has the distribution of a \(Beta\left(\frac{1}{2},\frac{1}{2}\right)\) density. Below, we can see the distributions of a \(Beta\left(\frac{1}{2},\frac{1}{2}\right)\) and a \(Beta(1, 1)\), or flat prior.

Here, we see that the Jeffreys prior compensates for the likelihood by weighting the extremes. Under the likelihood, data around \(p=0.5\) has the least effect on the posterior, while data that shows a true \(p=0\) or \(p=1\) will have the greatest effect on the posterior. The Jeffreys prior is noninformative because it weights the opposite of the likelihood function while a flat prior would not. In this case, the Jeffreys prior happens to be a conjugate prior, though this is not always true.