This workflow demonstrates how to conduct multiclass classification using the Redfield BERT Nodes.
After 2 epochs of training, the classifier should reach more than 54% test accuracy without fine tuning and more than 97% test accuracy with fine tuning. Increasing the number of training epochs can increase the performance significantly.
The BBC Full Text Document Classification data set used here consists of the 2225 documents in 5 categories and is taken from D. Greene and P. Cunningham. "Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering", Proc. ICML 2006. It can be found on Kaggle: https://www.kaggle.com/shivamkushwaha/bbc-full-text-document-classification
If you wish to track your training progress, you can go to File->Preferences->KNIME->KNIME GUI and set the console log level to Info. Then you can monitor the status of the training in the console view (typically at the bottom right of the KNIME workbench).
Required Python packages (need to be available in your TensorFlow 2 Python environment):
bert==2.2.0
bert-for-tf2==0.14.4
Keras-Preprocessing==1.1.2
numpy==1.19.1
pandas==0.23.4
pyarrow==0.11.1
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow==2.2.0
tensorflow-estimator==2.2.0
tensorflow-hub==0.8.0
tokenizers==0.7.0
tqdm==4.48.0
transformers==3.0.2
Workflow
BBC Documents classification with BERT extension
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 4.3.0 Note: Not all extensions may be displayed.
Legal
By using or downloading the workflow, you agree to our terms and conditions.