How do SOMs reduce dimensionality of data?

by diwgan32   Last Updated June 06, 2017 13:19 PM

This is a problem with which I have been grappling with for days. From my research on self-organizing maps, I know that a common feature of self organizing maps is to reduce the dimensionality of data. For example, if you had a 3x3 SOM, and an input space consisting of 50 10-Dimensional vectors, the SOM would reduce this to 50 2-Dimensional vectors. If I am creating my own SOM, where is this data? Please excuse me if my question too vague or broad. The reference vectors that are attached to each neuron in the SOM are the same dimension as the input space. The input space itself does not get reduced in dimensionality. So where is the reduced-dimension data? In other words, what data structure in relation to the self organizing map contains this data? My only guess is that this data could be found in the location of each node in the self organizing map.


Answers 2

The SOM grid is a 2-d manifold or topological space onto which each observation in the 10-d space is mapped via its similarity with the prototypes (code book vectors) for each cell in the SOM grid.

The SOM grid is non-linear in the full dimensional space; the "grid" is warped to more-closely fit the input data during training. However, the key point in terms of dimension reduction is that distances can be measured in the topological space of the grid - i.e. the 2 dimensions - instead of the full $m$-dimensions. (Where $m$ is the number of variables.)

Simply, the SOM is a mapping of the $m$-dimensions onto the 2-d SOM grid.

Gavin Simpson
Gavin Simpson
August 02, 2013 04:00 AM

Consider your 2-dimensional SOM artificial neurons units as aiming to have values equal to those of your high-dimensional data. It attains this through the learning process- where a sample (a row of your data) is taken from the data, compared for similarity with each of the units on the map. The unit that comes closer in terms of similarity to the sample becomes the winner of that sample. Then to effect the "learning", the value on the unit is adjusted to be closer to that of the sample it has just won. Units near this winner have their values adjusted too, but with smaller amounts than that of the winner. That adjustments of the units values make the learning to occur. The process is repeated for all samples from the data. At the end of the learning process, you have a learned SOM with units that come closer to resembling your data values.

Note that your data values remain intact, they were only read and assisted in conducting the learning process.

Now concentrate on the values carried by each unit at the end of the learning process. Each unit may have won several samples from the data and they are now "clustered" around it. That is, several samples from your data can be comfortably represented by one unit of the SOM - this brings in the dimensionality reduction idea! Your 10-dimensinal data can now be visualized as 2-dimensional since similar data in the original dataset can be respresented by one unit of the SOM.

For a deeper understanding check out here

June 06, 2017 13:10 PM

Related Questions

Self-organizing map dimension

Updated November 03, 2018 06:19 AM

Differences between t-SNE and SOM

Updated November 23, 2018 13:19 PM