4.3 Following up on Bernoulli

Suppose X is binomially distributed:

X\sim Bin(n,\theta) \\ f(x|\theta) = {n\choose x} \theta^x (1-\theta)^{n-x}

From the warmup, given that E(X)=p, we know that

\begin{aligned} I(\theta) & = -E_{\theta} \left[ \frac{d^2 log f(x|\theta)}{d\theta^2}\right] \\ & = \frac{1}{\theta (1-\theta)} \end{aligned}

Then \pi_J(\theta) = I(\theta)^{\frac{1}{2}} \propto \theta^{\frac{1}{2}}(1-\theta)^{\frac{1}{2}}, so the Jeffreys prior has the distribution of a Beta\left(\frac{1}{2},\frac{1}{2}\right) density. Below, we can see the distributions of a Beta\left(\frac{1}{2},\frac{1}{2}\right) and a Beta(1, 1), or flat prior.

Here, we see that the Jeffreys prior compensates for the likelihood by weighting the extremes. Under the likelihood, data around p=0.5 has the least effect on the posterior, while data that shows a true p=0 or p=1 will have the greatest effect on the posterior. The Jeffreys prior is noninformative because it weights the opposite of the likelihood function while a flat prior would not. In this case, the Jeffreys prior happens to be a conjugate prior, though this is not always true.