Accuracy of fine-tuning BERT varied significantly based on epochs for intent classification task

by davislf2   Last Updated June 22, 2019 03:19 AM

I used Bert base uncased as embedding and doing simple cosine similarity for intent classification in my dataset (around 400 classes and 2200 utterances, train:test=80:20). The base BERT model performs 60% accuracy in the test dataset, but different epochs of fine-tuning gave me quite unpredictable results. This is my setting:


These are my experiments:

base model   accuracy=0.61
epochs=2.0   accuracy=0.30
epochs=5.0   accuracy=0.26
epochs=10.0  accuracy=0.15
epochs=50.0  accuracy=0.20
epochs=75.0  accuracy=0.92
epochs=100.0 accuracy=0.93

I don't understand while it behaved like this. I expect that any epochs of fine-tuning shouldn't be worse than the base model because I fine-tuned and inferred on the same dataset. Is there anything I misunderstand or should care about?

Related Questions

Question about Continuous Bag of Words

Updated July 10, 2017 12:19 PM