3.3 Warmer-up: \(I(\theta)\)
In Statistics, Fisher Information measures the amount of information that an observable random variable X carries about an unknown parameter \(\theta\) of a distribution that models \(X\).
Let \(f(X; \theta)\) denote the probability density function (pdf) or probability mass function (pmf) of \(X\) conditional on the value of \(\theta\). The distribution of \(f(X; \theta)\) indicates the amount of information data \(X\) provides on parameter \(\theta\).
Formally, the partial derivative with respect to \(\theta\) of the natural logarithm of \(f(X;\theta)\) is defined as the score. If \(\theta_0\) is the true parameter, the score is \(0\).
\[\begin{align} E\left[\frac{\partial}{\partial \theta} log f(X;\theta)|\theta = \theta_0 \right] & = \int \frac{\frac{\partial}{\partial \theta} f(x;\theta)|\theta=\theta_0}{f(x;\theta_0)} f(x;\theta_0) dx \\ & = \frac{\partial}{\partial \theta} \int f(x;\theta)dx \\ & = \frac{\partial}{\partial \theta} 1 = 0 \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; (1) \end{align}\]
Since the expected value of the score function is \(0\), the variance of the score function is defined as the Fisher Information:
\[I(\theta) = E\left[ \left(\frac{\partial}{\partial \theta} log f(X;\theta)\right)^2 | \theta\right] = \int \left( \frac{\partial}{\partial \theta} log f(x;\theta)\right)^2 f(x;\theta)dx\]
If \(log(f(X;\theta)\) is twice differentiable, under certain regularity conditions, we can also write the Fisher Information as:
\[I(\theta) = -E\left[ \frac{\partial^2}{\partial \theta^2} log f(X;\theta)| \theta\right]\]