Authors: Wilfried Seidel (Helmut Schmidt University, Germany) and Miodrag Lovric (Radford University, USA)
This article provides a thorough and accessible exposition of mixture models, a foundational tool in statistical modeling used to account for heterogeneity in data. A mixture distribution represents a convex combination of component distributions, each associated with a different subpopulation or data-generating mechanism. While historically rooted in clustering, their applications span nearly all areas of statistics and data science.
The article begins with a formal definition of finite mixtures, where each observation is assumed to arise from one of k subpopulations with specified mixing weights. It also introduces the concept of mixing distributions, extending the model beyond finite settings. Identifiability is discussed as a key challenge: different parameter configurations may yield the same mixture distribution, complicating inference.
The authors review major estimation strategies, including method of moments, distance minimization, and especially likelihood-based approaches. They detail the Expectation-Maximization (EM) algorithm, which iteratively maximizes the likelihood by treating component membership as latent data. Strengths and weaknesses of EM are addressed: while it guarantees non-decreasing likelihoods, it may converge to local optima or singularities.
Robustness and model misspecification are addressed, with discussions of log-concave mixtures and Bayesian formulations. The latter allows for incorporating prior knowledge and modeling uncertainty but presents computational challenges like label switching and the lack of natural conjugate priors.
A key issue in practice is determining the number of components. The article evaluates model selection techniques such as AIC, BIC, and likelihood ratio tests, noting their limitations in non-regular settings. Bootstrap-based inference is suggested for improved accuracy.
The article closes by highlighting recent developments in mixture modeling, including:
mixtools
, Mclust
) and Python (scikit-learn
)For formal derivations, simulation examples, and a rich bibliography, refer to the full article in the International Encyclopedia of Statistical Science.