Self-organizing maps are claimed to be an approach for dimensionality reduction. However, I am kind of confused about this claim.
Consider the following example, I have a data set with 200 data points and each data point is represented by a feature vector with 1000 dimensions. Assume I would like to train a map with a $1 \times 2$ grid. In other words, I will map these 200 points to two cells. Just for illustration purposes, suppose the first 100 points are mapped onto the first cell and the latter 100 points are mapped onto the second cell. From the viewpoint of dimensionality reduction, we can say we use a 1-dimensional output space to represent the original 1000-dimensional space. But this really confuses me a lot. According to this example, it looks to me that the first 100 points will share the same feature vector, which is just 1 in the one-dimensional space; and the latter 100 points will share the same feature vector, which is 2 in the same one-dimensional space. Is my understanding correct?
No, the feature vectors $x_1$ and $x_2$ are in the 1000-dimensional space. If you train with the same points for long enough, each feature vector approaches the Euclidean mean of its corresponding data points.
In your extreme example, I'd say your view is correct. You specified that you wanted a reduction to one dimension with two possible values in that dimension and that's what you got. As Wikipedia says, SOM creates a discretized low-dimensional representation.
Perhaps the issue is how SOM does this. Let's say you specified a 3x3 SOM, which is a 2-D grid with 9 points. The SOM algorithm embeds this 2-D grid in your 1000-D space, as Neil G points out. It then adapts the 9 points to the data in such a way as to maintain the manifold's smoothness in 1000-D space, while moving the grid points to denser (in terms of your data) areas. (It does this by propagating changes to closest SOM points in the 2-D manifold, not to neighbors in the 1000-D space.)
So, each of the 9 points in your SOM grid has a 1000-D location in 1000-D space, but after the algorithm is finished, you are considering the 9 points in the 3x3 grid itself, reducing 1000-D space to (discretized) 2-D space.
You could also look at this as a kind of clustering of your data around 9 points that are constrained in relationship to each other.
I would say that SOM reduces the dimension for visual and other analysis purposes, the mapping between the reduced space and the original space is lost. This is due the fact that your grid of 3x3 or 9 2D points are defined a priori and kept unchanged during training. What is mapped directly to the reduced space is the topological arrangement of the original space. In other words, if you pick two neighbors at the reduced space, they will be neighbors (with greater or lower distance - see umatrix) at the original space.
A 1 by 2 SOM is not a 1-dimensional SOM, but 2-dimensional.
Your view that "... we can say we use a 1-dimensional output space to represent the original 1000-dimensional space." is therefore not right.
If you want a 1-dimensional SOM, set it at 1 by 1. Your original data of 200 by 1000 will then be reduced to 1 by 1000. That is, from a 200-dimensinal dataset to a 1-dimensional dataset.