Instructions to use ProdicusII/ZeroShotBioNER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ProdicusII/ZeroShotBioNER with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ProdicusII/ZeroShotBioNER")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ProdicusII/ZeroShotBioNER") model = AutoModelForTokenClassification.from_pretrained("ProdicusII/ZeroShotBioNER") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| datasets: | |
| - bigbio/chemdner | |
| - ncbi_disease | |
| - jnlpba | |
| - bigbio/n2c2_2018_track2 | |
| - bigbio/bc5cdr | |
| widget: | |
| - text: Drug<SEP>He was given aspirin and paracetamol. | |
| language: | |
| - en | |
| metrics: | |
| - precision | |
| - recall | |
| - f1 | |
| pipeline_tag: token-classification | |
| tags: | |
| - token-classification | |
| - biology | |
| - medical | |
| - zero-shot | |
| - few-shot | |
| library_name: transformers | |
| # Zero and few shot NER for biomedical texts | |
| ## Model description | |
| Model takes as input two strings. String1 is NER label. String1 must be phrase for entity. String2 is short text where String1 is searched for semantically. | |
| model outputs list of zeros and ones corresponding to the occurance of Named Entity and corresponing to the tokens(tokens given by transformer tokenizer) of the Sring2. | |
| ## Example of usage | |
| ```python | |
| from transformers import AutoTokenizer | |
| from transformers import BertForTokenClassification | |
| modelname = 'ProdicusII/ZeroShotBioNER' # modelpath | |
| tokenizer = AutoTokenizer.from_pretrained(modelname) ## loading the tokenizer of that model | |
| string1 = 'Drug' | |
| string2 = 'No recent antibiotics or other nephrotoxins, and no symptoms of UTI with benign UA.' | |
| encodings = tokenizer(string1, string2, is_split_into_words=False, | |
| padding=True, truncation=True, add_special_tokens=True, return_offsets_mapping=False, | |
| max_length=512, return_tensors='pt') | |
| model = BertForTokenClassification.from_pretrained(modelname, num_labels=2) | |
| prediction_logits = model(**encodings) | |
| print(prediction_logits) | |
| ``` | |
| ## Available classes | |
| The following datasets and entities were used for training and therefore they can be used as label in the first segment (as a first string). Note that multiword string have been merged. | |
| * NCBI | |
| * Specific Disease | |
| * Composite Mention | |
| * Modifier | |
| * Disease Class | |
| * BIORED | |
| * Sequence Variant | |
| * Gene Or Gene Product | |
| * Disease Or Phenotypic Feature | |
| * Chemical Entity | |
| * Cell Line | |
| * Organism Taxon | |
| * CDR Disease | |
| * Chemical | |
| * CHEMDNER | |
| * Chemical | |
| * Chemical Family | |
| * JNLPBA | |
| * Protein | |
| * DNA | |
| * Cell Type | |
| * Cell Line | |
| * RNA | |
| * n2c2 | |
| * Drug | |
| * Frequency | |
| * Strength | |
| * Dosage | |
| * Form | |
| * Reason | |
| * Route | |
| * ADE | |
| * Duration | |
| On top of this, one can use the model in zero-shot regime with other classes, and also fine-tune it with few examples of other classes. | |
| ## Code availibility | |
| Code used for training and testing the model is available at https://github.com/br-ai-ns-institute/Zero-ShotNER | |
| ## Citation |