This workflow shows an example on how to fine-tune a BERT model (in this case BioBERT) and use the trained model afterwards.
BioBERT is a domain-specific language representation model pre-trained on large-scale biomedical corpora and thus performs better than base BERT when using biomedical data.
In this example a BioBERT model is finetuned on the NCBI Disease dataset for token classification. A common token classification task is Named Entity Recognition (NER).
The dataset contains a collection of 793 PubMed abstracts which are fully annotated including the disease names as tokens and the corresponding tags.
The finetuned model is then used to recognize diseases in a given text which will then be displayed using spacy visualization for entity recognition.
BioBERT:
Title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Authors: Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo
NCBI Disease dataset:
Title: NCBI disease corpus: a resource for disease name recognition and concept normalization
Authors: Dougan, Rezarta Islamaj and Leaman, Robert and Lu, Zhiyong
Workflow
BioBERT Fine-tuning for Named Entitiy Recognition
External resources
Used extensions & nodes
Created with KNIME Analytics Platform version 5.0.0
Legal
By using or downloading the workflow, you agree to our terms and conditions.