Bayesian information criterion (BIC) on KDE?

by davipatti   Last Updated October 29, 2018 16:19 PM

Consider two datasets, $X$ and $Y$. Both have 2 dimensions with $a$ and $b$ samples respectively.

I would like to test whether one kernel density estimate (KDE) on the concatenated data ($XY$, shape $(n+m, 2)$) is a better model of the data than two separate KDEs on each individual dataset. (I'll denote a KDE trained on dataset $X$ as $kde_X$).

Using something like the bayesian information criterion (BIC) to compare the one-KDE vs. the two-KDE setup seems attractive. With $n$ samples, $k$ parameters and likelihood $\mathcal{L}$:

$$BIC = ln(n)k - 2ln(\mathcal{L})$$

Then, for the one-KDE setup I would compute BIC with:

  • $n$ = $a + b$
  • $k$ = 1
  • $\mathcal{L}$ = $P(XY | kde_{XY})$

And for the two-KDE setup I would compute BIC with:

  • $n = a + b$
  • $k = 2$
  • $\mathcal{L} = P(X | kde_{X}) \times P(Y | kde_{Y})$

My question is: Is this a fair way to compare these two model setups? I am more used to seeing BIC being used to compare parametric models where $k$ really is a number of parameters, as opposed to here, where I'm using $k$ as the number of KDEs.

Furthermore: Is there a better way to test whether to model the two datasets jointly or separately?



Related Questions


accuracy conditional on feature values

Updated April 25, 2018 23:19 PM