Mixture Models: Theory, Estimation, and Recent Advances

Authors: Wilfried Seidel (Helmut Schmidt University, Germany) and Miodrag Lovric (Radford University, USA)

This article provides a thorough and accessible exposition of mixture models, a foundational tool in statistical modeling used to account for heterogeneity in data. A mixture distribution represents a convex combination of component distributions, each associated with a different subpopulation or data-generating mechanism. While historically rooted in clustering, their applications span nearly all areas of statistics and data science.

The article begins with a formal definition of finite mixtures, where each observation is assumed to arise from one of k subpopulations with specified mixing weights. It also introduces the concept of mixing distributions, extending the model beyond finite settings. Identifiability is discussed as a key challenge: different parameter configurations may yield the same mixture distribution, complicating inference.

The authors review major estimation strategies, including method of moments, distance minimization, and especially likelihood-based approaches. They detail the Expectation-Maximization (EM) algorithm, which iteratively maximizes the likelihood by treating component membership as latent data. Strengths and weaknesses of EM are addressed: while it guarantees non-decreasing likelihoods, it may converge to local optima or singularities.

Robustness and model misspecification are addressed, with discussions of log-concave mixtures and Bayesian formulations. The latter allows for incorporating prior knowledge and modeling uncertainty but presents computational challenges like label switching and the lack of natural conjugate priors.

A key issue in practice is determining the number of components. The article evaluates model selection techniques such as AIC, BIC, and likelihood ratio tests, noting their limitations in non-regular settings. Bootstrap-based inference is suggested for improved accuracy.

The article closes by highlighting recent developments in mixture modeling, including:

Integration with deep learning (e.g., deep mixture models)
Bayesian nonparametrics such as Dirichlet Process Mixtures
Software tools in R (mixtools, Mclust) and Python (scikit-learn)

For formal derivations, simulation examples, and a rich bibliography, refer to the full article in the International Encyclopedia of Statistical Science.