BioBERT Fine-tuning for Named Entitiy Recognition

Workflow

BioBERT Fine-tuning for Named Entitiy Recognition

Draft Latest edits on

This workflow shows an example on how to fine-tune a BERT model (in this case BioBERT) and use the trained model afterwards. BioBERT is a domain-specific language representation model pre-trained on large-scale biomedical corpora and thus performs better than base BERT when using biomedical data. In this example a BioBERT model is finetuned on the NCBI Disease dataset for token classification. A common token classification task is Named Entity Recognition (NER). The dataset contains a collection of 793 PubMed abstracts which are fully annotated including the disease names as tokens and the corresponding tags. The finetuned model is then used to recognize diseases in a given text which will then be displayed using spacy visualization for entity recognition. BioBERT: Title: BioBERT: a pre-trained biomedical language representation model for biomedical text mining Authors: Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo NCBI Disease dataset: Title: NCBI disease corpus: a resource for disease name recognition and concept normalization Authors: Dougan, Rezarta Islamaj and Leaman, Robert and Lu, Zhiyong

External resources

Loading deploymentsLoading ad hoc jobs

Legal

By using or downloading the workflow, you agree to our terms and conditions.