The universal approximation theorem:
A neural network with 3 layers and suitably chosen activation functions can any approximate continuous function on compact subsets of $R^n$.
The no free lunch theorem:
If a learning algorithm performs well on some data sets, it will perform poorly on some other data sets.
I sense a contradiction here: the first theorem implies that NNets are the "one learning approach to rule them all", while second says that such a learning approach doesn't exist.
I'm pretty certain NFLT holds, so there must be a caveat, but I can't put my finger on it?
What is the caveat in the universal approximation theorem so that NFLT holds?
Approximating the optimal decision function for the training data is no guarantee of performance on unseen data. So, UAT and NFLT are not in contradiction, because approximating the training decision surface to arbitrary precision does not mean your algorithm is the best (in fact, it might be worthless).
The UAT says that a suitable neural network can represent any approximate continuous function, essentially taking an input, passing it through a network of particular weights, and mapping it to an output.
That says nothing about how the particular weights on that network are determined, which is what NFL deals with. Since the neural net can represent a huge range of functions, we need an algorithm to learn the weights on the network that will get it to do what we want.
Depending on the function we want the neural network to represent, some algorithms will find the appropriate weights quickly and some will not. No algorithm will outperform any other if we average performance over all possible mappings from input to output.
In short, UAT addresses what kinds of input-output mappings we can represent with a neural network. NFL addresses our ability to actually learn the weights that will generate that mapping.