I have multiple large sample datasets in matrix format (each has 15000 rows and 5-50 columns) corresponding to different experiments. Each matrix contains the same number of samples(rows) but the variables(columns) are not the same. My objective is to cluster the samples on the basis of all the experiments.
I tried to use Unsupervised multiple kernel learning (UMKL) to integrate the datasets followed by kernelPCA using "mixkernel" package in R (https://www.ncbi.nlm.nih.gov/pubmed/29077792). The UMKL step calculates kernels for each dataset and combines the kernels using 3 different approaches: 1) calculating a consensus kernel from multiple kernels 2) calculating a sparse kernel preserving the original topology of the data 3) calculating a full kernel preserving the original topology of the data
The kernel calculation step was fine but the kernel integration step (all 3 approaches) runs very long and my computer hangs.
Is there any way to handle this problem? More specifically, is there any way to handle multiple kernel integration for large sample datasets?
Any suggestion alternative to using kernel methods will also work.