When training a neural network, do I get the same results in both the following cases?
I am training a model on variable-length sequential data. The model is already well-tuned. I would like to divide the data in subsets of different sizes to optimize the training speed (i.e.: spend less time training on padding data). If the 2 cases above are equivalent the solution is straightforward.
I can share information about the particular model, but think it should be possible to give a theoretical answer which generalizes to any model.