top of page
< Back

Hugging Faces For Sequences Classification NLP with TensorFlow

hugging faces with tensorflow on NLP classification problem roberta bert electra xlnet and other transformers compared side by side in the TensorFlow Hugging Faces Tips how to learn

In this Hugging Faces tips video we will be looking at the implementation of the Roberta, Electra, XLNet, Deberta, RoFormer, and BERT transformer models in TensorFlow.  We will also be using a classification dataset from the Hugging Faces dataset library.

When implementing a transformer we have the option to decide if we want to train only the head of the model or the entire model including the transformer.  As you can imagine this takes a considerable amount of extra time to do this so we will not only compare the different types of transformers available we will also compare the scores and training of just retraining the head or the entire model. 


As we saw in the video Electra performs well in terms for accuracy as well as training times.  Whether we trained only the classification head or the entire transformer Electra produced roughly the same .93 accuracy.  This would be our number one choice for production as well it was considerably faster at making predictions than any other transformer.

BERT although considerably slower to train.  The training time was on par with other transformers.   But it's had the best overall predictions.  Again it didn't make much difference if we retrained the head only or the entire transformer.

Overall we found that a sequences classification problem didn't benefit by anything more than a fraction of a percent in any at all.  In the case of Electra when training the entire mode we found acheived a score of 0.9340 and when training only the classifier head we achieved 0.9315.

With RoFormer going up from 0.9065 to 0.9120 and Bert only increasing it's accuracy by 0.002 is likely not going to be worth the extra training time.

Electra was the clear winner in speed.  Taking only roughly 6 minutes to train for 10 epochs while the others ranged from 20 to 30 minutes. 

Important to remember how simple this problem really was in the context of the complexity possible with transformers so in the situation it makes sense that minimal fine-tuning would be required and this project highlights this fact.

comparing the training and test scores of the for sequence classification version of the hugging faces library in TensorFlow while only fine tuning the head of the model
comparing the different training and prediction times of electra bert models while using tensorflow
bottom of page