Bert base uncased as embedding and doing simple cosine similarity for intent classification in my dataset (around
400 classes and 2200 utterances,
train:test=80:20). The base BERT model performs 60% accuracy in the test dataset, but different epochs of fine-tuning gave me quite unpredictable results. This is my setting:
max_seq_length=150 train_batch_size=16 learning_rate=2e-5
These are my experiments:
base model accuracy=0.61 epochs=2.0 accuracy=0.30 epochs=5.0 accuracy=0.26 epochs=10.0 accuracy=0.15 epochs=50.0 accuracy=0.20 epochs=75.0 accuracy=0.92 epochs=100.0 accuracy=0.93
I don't understand while it behaved like this. I expect that any epochs of fine-tuning shouldn't be worse than the base model because I fine-tuned and inferred on the same dataset. Is there anything I misunderstand or should care about?