# Advantages of monte carlo over numerical quadrature for integration in low dimension MLE?

by Kenneth Ball   Last Updated April 24, 2017 16:19 PM

I am doing a parameter estimation for a two-level model via maximum likelihood estimation. My MLE optimization procedure requires multiple numerical integrations over a 3 dimensional space at each step to estimate the mean and covariance of a multivariate normal distribution.

Basically, my MLE problem involves a marginalized likelihood $$(\hat{\mu},\hat{\Sigma}) = \min_{\mu,\Sigma} \prod_{i=1}^n\int_\theta \prod_{t}f(x_i(t);y_i(t;\theta))\mathcal{N}(\theta;\mu,\Sigma)d\theta$$ where $f$ is the probability of an observation $x$ given expected behavior $y$ generated by model parameters $\theta$ at time $t$. The data is longitudinal/panel, with $n$ individuals, and each individual's expected behavior $y_i$ is derived from a single vector $\theta$ drawn from a normal distribution with mean $\mu$ and covariance $\Sigma$.

I have written a function to derive not only the approximate log likelihood, but also the analytic gradients of the mean and covariance depending on the numerical quadrature scheme used for integration (I'm using a version of Simpson's 3/8 rule). This allows for fast application of quasi-Newton optimization schemes (BFGS) or stochastic gradient descent, which would otherwise be quite slow because the optimization problem parameter space is 9 dimensional (3 for the mean vector, 6 for the covariance matrix) and numerical estimation of the gradient requires many more functional evaluations (and hence numerical integrations) at each optimization step... this way I can basically do one iterate of numerical integration at each step.

I like the idea that my gradients are analytic, but I'm feeling like there could be some sort of bias/error introduced by my quadrature (which does at least adapt to the current estimate of the mean and covariance). On the other hand I feel like Monte Carlo-esque approaches would be relatively slow here, especially since I was able to vectorize (in Octave) most of my code to evaluate the likelihood with direct quadrature.

My question is: are their advantages to Monte Carlo integration techniques besides tractability in high dimensions that should make me consider something like Metropolis Hastings MCMC instead of my direct numerical integration approach, which is workable in my lower-dimensional problem?

Tags :