In my idea, deep learning is a process of feature extraction.
Just like multiple layer neural networks (NN):
Input1 => L1 => L2 => ... => Ln => Output1. The special aspect of deep learning is to let
Output1 equal to
Input1. As a result, we can get the error of
Output1. Then, we can try to use backpropagation (BP) to train our model to minimize the error. When it is complete, all layers' output is the internal feature representations from edge to partial of object to full object. This made deep learning so fancy. This concept is illustrated by this picture:
Now, back to convolutional neural networks (CNN). CNN use convolution to extract features and try to learn all filters by BP. I do not see CNN generate the output similar to the input pictures. It is just convolution and pooling and so on to become very small pixel fractions, called basis.
How CNN use deep learning concept in its implementation? Why BP can train CNN model to the correct internal features of all layers?
Deep learning is a generic term that refers to the fact that a deep neural network has at least one hidden layer.
First, mind that deep learning is a buzz term. There is not even a consensus of a formal definition in the research community. A discussion of the term does not lead anywhere, really. It's just a word.
That being said, convolutional nets are deep because they rely on multiple layers of feature extraction, as you said. They extract features from the input to predict an outcome.
What you refer to is a "generative" approach, i.e. the features are used to create the observation (a picture, not a class label). That is what made deep learning popular, but it is in no way limited to that.
The deep learning is an approach where you have a lot of relatively simple layers. You increase learning capabilities by increasing the number of layers, as opposed to increased complexity of layers. You could for instance come up with very fancy output functions, maybe nonlinear functions of inputs or complicated connections. Instead you stick with simple things like ReLU and liner combination and softmax, but stack a lot of layers one on top of other. That's why CNN perfectly fits into this very generic and rather vague definition of deep learning. Look at CNN's components, they are usually very simple MAX, convolutions etc.