# Bayesian information criterion (BIC) on KDE?

by davipatti   Last Updated October 29, 2018 16:19 PM

Consider two datasets, $$X$$ and $$Y$$. Both have 2 dimensions with $$a$$ and $$b$$ samples respectively.

I would like to test whether one kernel density estimate (KDE) on the concatenated data ($$XY$$, shape $$(n+m, 2)$$) is a better model of the data than two separate KDEs on each individual dataset. (I'll denote a KDE trained on dataset $$X$$ as $$kde_X$$).

Using something like the bayesian information criterion (BIC) to compare the one-KDE vs. the two-KDE setup seems attractive. With $$n$$ samples, $$k$$ parameters and likelihood $$\mathcal{L}$$:

$$BIC = ln(n)k - 2ln(\mathcal{L})$$

Then, for the one-KDE setup I would compute BIC with:

• $$n$$ = $$a + b$$
• $$k$$ = 1
• $$\mathcal{L}$$ = $$P(XY | kde_{XY})$$

And for the two-KDE setup I would compute BIC with:

• $$n = a + b$$
• $$k = 2$$
• $$\mathcal{L} = P(X | kde_{X}) \times P(Y | kde_{Y})$$

My question is: Is this a fair way to compare these two model setups? I am more used to seeing BIC being used to compare parametric models where $$k$$ really is a number of parameters, as opposed to here, where I'm using $$k$$ as the number of KDEs.

Furthermore: Is there a better way to test whether to model the two datasets jointly or separately?

Tags :