4.2 Invariance to parametrization

To show that Jeffreys prior is preferred over a commonly mistaken “noninformative” prior such as \(Beta(1,1)\), we show that Jeffreys prior is invariant to reparametrization. This process includes proof of the Invariance Principle by Jeffreys (1946) and Jordan (2010).

Suppose \(\pi(\theta)\) is a Jeffreys prior on \(\theta\), and we define a new parameter \(\phi = h(\theta)\) as a function of \(\theta\). The question is whether \(\pi_J(\theta)\) after a change of variable is the same as \(\pi_J(\phi)\). To prove that, we first calculate Fisher Information of \(\phi\):

\[\begin{align} I(\phi) & = -E\left[\frac{d^2 logf(X|\phi)}{d\phi^2}\right] \\ & = -E\left[\frac{d}{d\phi}\frac{dlogf(X|\phi)}{d\phi}\right] \\ & = -E\left[\frac{d}{d\phi}\frac{dlogf(X|\phi(\theta))}{d\phi}\right] \\ & = -E\left[\frac{d}{d\phi}\left(\frac{dlogf(X|\phi(\theta))}{d\theta}·\frac{d\theta}{d\phi}\right)\right] \end{align}\]

Let’s stop here for a second. Here’s where the Chain Rule works its magic.

\[(A·B)' = A'·B + A·B'\]

If we think of \(\frac{dlogf(X|\phi(\theta))}{d\theta}\) as part \(A\) of the chain rule formula, and \(\frac{d\theta}{d\phi}\) as part B, we have something we can work with:

\[\begin{align} - E\left[\frac{dA}{d\phi}·B+\frac{dB}{d\phi}·A\right] & = -E\left[\frac{d^2 log f(X|\phi(\theta))}{d\theta d\phi}·\frac{d\theta}{d\phi} + \frac{d^2\theta}{d\phi^2}·\frac{dlogf(X|\theta)}{d\theta}\right] \\ & = -E\left[\frac{d^2 logf(X|\theta)}{d\theta d\theta}·\frac{d\theta}{d\phi}·\frac{d\theta}{d\phi} + \frac{d log f(X|\theta)}{d\theta}\frac{d^2\theta}{d\phi^2}\right] \\ & = -E\left[\frac{d^2 logf(X|\theta)}{d\theta^2}·\left(\frac{d\theta}{d\phi}\right)^2 + \frac{d log f(X|\theta)}{d\theta}\frac{d^2\theta}{d\phi^2}\right] \;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (2) \end{align} \]

From Section 2 Equation (1), we know that

\[E\left[\frac{d log f(X|\theta)}{d\theta}\right] = 0\]

Substituting this result into Equation (2), we observe that

\[I(\phi) = I(\theta)\left(\frac{d\theta}{d\phi}\right)^2 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (3)\]

Take the square root of that, we get

\[\sqrt{I(\phi)} = \sqrt{I(\theta)} \left|{\frac{d\theta}{d\phi}}\right| \;\; \text{OR} \;\; \pi_J(\phi) = \pi_J(\theta) \left|{\frac{d\theta}{d\phi}}\right| \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (4)\]

\(Q.E.D.\)