Information geometry aims to apply the techniques of differential geometry to statistics. Often it is useful to think of a family of probability distributions as a statistical manifold. For example, normal Gaussian distributions form a 2-dimensional manifold, parameterised by $(\mu, \sigma)$, mean and standard deviation. On such manifolds there are notions of Riemannian metric, connection, curvature, and so on, of statistical relevance.
More precisely,
Kullback-Leibler information, or relative entropy, features as a measure of divergence (not quite a metric, because it’s asymmetric), and Fisher information takes the role of curvature. One useful aspect of information geometry is that it gives a means to prove results about statistical models, simply by considering them as well-behaved geometrical objects. For instance, it’s basically a tautology to say that a manifold is not changing much in the vicinity of points of low curvature, and changing greatly near points of high curvature. Stated more precisely, and then translated back into probabilistic language, this becomes the Cramer-Rao inequality, that the variance of a parameter estimator is at least the reciprocal of the Fisher information. (Shalizi)
Founders of the systematical theory are N. N. Chentsov and Shun-ichi Amari.
For $X$ a measurable space let $S$ be (a subspace of) the space of probability measures on $X$, equipped with the structure of a smooth manifold.
The Fisher metric on $S$ is the Riemannian metric given on two vector fields $v,w \in T S$ by
where $E_s(\cdots)$ denotes the expectation value under the measure $s \in S$ of the function $x \mapsto v(log s)_x w(log s)_x$ on $X$.
For instance (Amari, Section 2.1).
See also Fisher metric, where Fisher metric in other contexts and quantum generalizations are treated. See also quantum information.
Textbooks providing the big picture
For a series of articles, see
Lecture notes include
See also
Hông Vân Lê, Statistical manifolds are statistical models, Journal of Geometry 84(1-2), March 2006, pp. 83-93.
Blog post.
A brief introduction with more references is
Several people have noted an equivalence between statistic inference as parametric model selection and statistical mechanics on a statistical manifold, e.g.,
The interpretation of a quantum field theory as a probability distribution on the space of field configurations so as to allow the conversion of techniques from information geometry to analogous measures of proximity between QFTs is in
A treatment of collective statistical inference as resulting in the partition function of a non-linear sigma model is in
More in the context of quantum field theory: