How good are VAE at producing embeddings?

There are many articles about applications of VAE such as image reconstruction, denoising, data compression / augmentation. However, I have not seen an example of embeddings for high dimensional data such as words.

Are there some papers about the use of VAE as embedding constructers ?

If there are, how do these deep embeddings compare with shallow technics such as Word2vec and other Skip-gram / Bag-of-words?

