by davipatti
Last Updated October 29, 2018 16:19 PM

Consider two datasets, $X$ and $Y$. Both have 2 dimensions with $a$ and $b$ samples respectively.

I would like to test whether one kernel density estimate (KDE) on the concatenated data ($XY$, shape $(n+m, 2)$) is a better model of the data than two separate KDEs on each individual dataset. (I'll denote a KDE trained on dataset $X$ as $kde_X$).

Using something like the bayesian information criterion (BIC) to compare the one-KDE vs. the two-KDE setup seems attractive. With $n$ samples, $k$ parameters and likelihood $\mathcal{L}$:

$$BIC = ln(n)k - 2ln(\mathcal{L})$$

Then, for the one-KDE setup I would compute BIC with:

- $n$ = $a + b$
- $k$ = 1
- $\mathcal{L}$ = $P(XY | kde_{XY})$

And for the two-KDE setup I would compute BIC with:

- $n = a + b$
- $k = 2$
- $\mathcal{L} = P(X | kde_{X}) \times P(Y | kde_{Y})$

My question is: **Is this a fair way to compare these two model setups?** I am more used to seeing BIC being used to compare parametric models where $k$ really is a number of parameters, as opposed to here, where I'm using $k$ as the number of KDEs.

Furthermore: **Is there a better way to test whether to model the two datasets jointly or separately?**

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger