Probability theory is concerned with mathematical models of phenomena that exhibit randomness, or more generally phenomena about which one has incomplete information.
Its central mathematical model is based mostly on measure theory. So from a pure mathematical viewpoint probability theory today could be characterized as the study of measurable spaces with a finite volume normalized to $1$.
Broader perspectives may stress the relevance of other pure mathematical concepts for probability theory, or include aspects of the interpretation of mathematical results to phenomenology, the latter part making naturally contact with the field of statistics.
Notice that in this respect probability theory has a similar status as (other(?!)) theories of physics: there is a mathematical model (measure theory here as the model for probability theory, or for instance symplectic geometry as a model for classical mechanics) which can be studied all in itself, and then there is in addition a more or less concrete idea of how from that model one may deduce statements about the observable world (the average outcome of a dice role using probability theory, or the observability of the next solar eclipse using Hamiltonian mechanics). The step from the mathematical model to its use as a tool for making statements about the observable world is subtle, maybe a subject of philosophy, but in any case outside of the realm of mathematics. In probability theory the meaning of this step is traditionally a cause of debate, with two antagonistic main schools of thought being the frequentist interpretation and the Bayesian perspective on the nature of the relation of probability theory to the observable world.
Random variables are defined typically in terms of probability spaces, cf. the basic entries on measure space, probability space, conditional probability. The modern point of view emphasises that many facts about random variables do not depend much on the choice of the probability spaces; the random variables are also often identified with their distributions.
Some argue that in the study of measure and probability, one should start not only with sigma algebra of measurable sets but also another of null sets. Somehow this is abstractly captured by the approach of commutative von Neumann algebras.
(…)
(…)
Families of probability distributions often form statistical models, that is, submanifolds of the space of all probability measures on a sample space. Techniques from differential geometry may be applied in a theory known as information geometry.
We describe here some perspectives on (parts of) probability theory from the categorical point of view (see nPOV). This perspective mainly applies to the study of situations involving Markov kernels and Chapman-Kolmogorov property.
Prakash Panangaden in Probabilistic Relations defines the category $SRel$ (stochastic relations) to have as objects sets equipped with a $\sigma$-field. Morphisms are conditional probability densities or stochastic kernels. So, a morphism from $( X, \Sigma_X)$ to $( Y, \Sigma_Y)$ is a function $h: X \times \Sigma_Y \to [0, 1]$ such that
If $k$ is a morphism from $Y$ to $Z$, then $k \cdot h$ from $X$ to $Z$ is defined as $(k \cdot h)(x, C) = \int_Y k(y, C)h(x, d y)$.
This is based on earlier work by Michele Giry, see Giry's monad.
Panangaden’s definition differs from Giry’s in the second clause where subprobability measures, rather than ordinary probability measures, are allowed.
Panangaden emphasises that the mechanism is similar to the way that the category of relations can be constructed from the power set functor. Just as the category of relations is the Kleisli category of the powerset functor over the category of sets Set, $SRel$ is the Kleisli category of the functor over the category of measurable spaces and measurable functions which sends a measurable space, $X$, to the measurable space of subprobability measures on $X$. This functor gives rise to a monad.
What is gained by the move from probability measures to subprobability measures? One motivation seems to be to model probabilistic processes from $X$ to a coproduct $X + Y$. This you can iterate to form a process which looks to see where in $Y$ you eventually end up. This relates to $SRel$ being traced.
There is a monad on $MeasureSpaces$, $1 + -: Meas \to Meas$. A probability measure on $1 + X$ is a subprobability measure on $X$. Panangaden’s monad is a composite of Giry’s and $1 + -$.
The opposite of the Kleisli category of Giry's monad has as morphisms $X \to Y$, linear maps from bounded functions on $X$ to bounded functions on $Y$, which send the characteristic function on $X$ to the characteristic function on $Y$.
For more details on Giry’s monad and its variants see probability monad.
Quantum mechanics studies complex probability amplitudes whose absolute square can be interpreted as the usual probability in the process of measurement, i.e. quantum reduction. An alternative approach via Wigner's function? has real, but possibly outside $[0,1]$, probabilities.
Relatedly, noncommutative von Neumann algebras may be interpreted as a noncommutative measure theory, analogous to the role that C*-algebras play in noncommutative geometry, see at quantum probability.
The free probability theory of Voiculescu and others is another noncommutative generalization, with physical applications related to random matrix theory.
The modern formalization of probability theory in measure theory originates around
Lecture notes include
Alexander Grigoryan, Measure theory and probability, 2008 pdf
Terence Tao, A review of probabiltiy theory, 2010 (web)
just as the natural numbers can be defined abstractly without reference to any numeral system (e.g. by the Peano axioms), core concepts of probability theory, such as random variables, can also be defined abstractly, without explicit mention of a measure space; we will return to this point when we discuss free probability later in this course.
Terence Tao, Free probability, 2010 (web)
Amir Dembo, Probability theory, 2012 (pdf)
For references related to Giry's monad and variants see there.
Formulation in category theory:
Tobias Fritz, A synthetic approach to Markov kernels, conditional independence, and theorems on sufficient statistics, (arXiv:1908.07021)
Kirk Sturtz?, Categorical probability theory (arXiv:1406.6030)
Formulation in topos theory:
For a more convenient setting for ‘higher-order’ probability theory, that is, one which admits higher-order functions, the following article uses the cartesian closed category of quasi-Borel spaces rather than the category of measurable spaces:
For big picture in probability theory see answers to
An instance of a “categorical thinking” (in a generalized sense) in solving probability problems is a solution to Buffon’s noodle problem (wikipedia) discussed by Tom Leinster at nCafe here.
Klain, Gian-Carlo Rota, Introduction to geometric probability
John C. Baez, Jacob D. Biamonte, A course on quantum techniques for stochastic mechanics, pdf
Discussion from a perspective of formal logic/type theory is in
Mikhail Gromov on possible generalizations/modifications of probability theory (especially probability theory seen as, fundamentally, a “”functor“ from a ”complex category“ to a ”simple category“”), as well as applications of probability within and without pure mathematics:
Probability regarded as Euclidean quantum field theory:
(…) see references at conformal field theory
In relation to the mass gap problem in lattice gauge theory:
Sourav Chatterjee, Yang-Mills for probabilists, in: Probability and Analysis in Interacting Physical Systems, PROMS 283 (2019) Springer (arXiv:1803.01950, doi:10.1007/978-3-030-15338-0)
Sourav Chatterjee, A probabilistic mechanism for quark confinement, Comm. Math. Phys. 2020 (arXiv:2006.16229)