The notion of relative entropy of states is a generalization of the notion of entropy to a situation where the entropy of one state is measured “relative to” another state.
is also called
Kullback-Leibler divergence
information divergence
information gain .
For two finite probability distributions $(p_i)$ and $(q_i)$, their relative entropy is
Alternatively, for $\rho, \phi$ two density matrices, their relative entropy is
For $X$ a measurable space and $P$ and $Q$ two probability measures on $X$, such that $Q$ is absolutely continuous with respect to $P$, their relative entropy is the integral
where $d Q / d P$ is the Radon-Nikodym derivative of $Q$ with respect to $P$.
Let $A$ be a von Neumann algebra and let $\phi$, $\psi : A \to \mathbb{C}$ be two states on it (faithful, positive linear functionals).
The relative entropy $S(\phi/\psi)$ of $\psi$ relative to $\phi$ is
where $\Delta_{\Phi,\Psi}$ is the relative modular operator? of any cyclic and separating vector representatives $\Phi$ and $\Psi$ of $\phi$ and $\psi$.
This is due to (Araki).
This definition is independent of the choice of these representatives.
In the case that $A$ is finite dimensional and $\rho_\phi$ and $\rho_\psi$ are density matrices of $\phi$ and $\psi$, respectively, this reduces to the above definition.
The machine learning process has been characterized as a minimization of relative entropy (Ackley, Hinton and Sejnowski 1985).
Relative entropy of states on von Neumann algebras was introduced in:
A characterization of relative entropy on finite-dimensional C-star algebras is given in
A survey of entropy in operator algebras is in
A characterization of machine learning as a process minimizing relative entropy is proposed in