T5 Small Multitask Text-to-Text

This model is a fine-tuned version of google-t5/t5-small on a balanced multitask subset of three public Hugging Face datasets:

It achieves the following validation loss:

  • Loss: 2.0058

The project demonstrates the T5 text-to-text format: every task is converted into input text -> output text and trained with the same seq2seq objective.

Training and Evaluation Data

The model was trained and evaluated on a balanced multitask subset. Each task uses a task prefix so that the same T5 model can learn summarization, translation, and question answering together.

Summarization

Dataset: EdinburghNLP/xsum

  • Input format: summarize: {document}
  • Target format: {summary}
  • Source column: document
  • Target column: summary

English to French Translation

Dataset: Helsinki-NLP/opus_books, config en-fr

  • Input format: translate English to French: {English sentence}
  • Target format: {French sentence}
  • Source field: translation["en"]
  • Target field: translation["fr"]

Generative Question Answering

Dataset: rajpurkar/squad

  • Input format: question: {question} context: {context}
  • Target format: {answer}
  • Source columns: question, context
  • Target field: first answer in answers["text"]

Split Strategy

Official splits were used when available. If a dataset did not provide all train, validation, and test splits, the script created deterministic splits with seed 42.

Final sampled split sizes:

Split Summarization Translation QA Total
Train 4,999 5,000 5,000 14,999
Validation 500 500 500 1,500
Test 500 500 500 1,500

The subset was balanced so that no single task dominated training. Text cleaning was intentionally light: repeated whitespace was collapsed and leading/trailing spaces were removed. Punctuation, casing, and task-specific wording were preserved.

Tokenization

The tokenizer was loaded from google-t5/t5-small.

  • Source max length: 512
  • Target max length: 128
  • Truncation: enabled
  • Target tokenization: tokenizer(..., text_target=targets)
  • Padding: dynamic batch padding with DataCollatorForSeq2Seq

Training

Main training settings:

Parameter Value
Base model google-t5/t5-small
Epochs 3
Train batch size 8
Eval batch size 8
Learning rate 5e-5
Weight decay 0.01
Source max length 512
Target max length 128
Generation beams 4
Hardware Hugging Face Jobs a10g-small

The model was trained with AutoModelForSeq2SeqLM, Seq2SeqTrainer, DataCollatorForSeq2Seq, and predict_with_generate=True.

Evaluation Results

Validation results:

Task Metric Value
Translation SacreBLEU 18.07
Summarization ROUGE-1 0.2684
Summarization ROUGE-2 0.0715
Summarization ROUGE-L 0.2060
Generative QA Exact Match 0.6520
Generative QA F1 0.7805

Test results:

Task Metric Value
Translation SacreBLEU 19.30
Summarization ROUGE-1 0.2635
Summarization ROUGE-2 0.0654
Summarization ROUGE-L 0.2006
Generative QA Exact Match 0.6020
Generative QA F1 0.7627

Full generated outputs and metrics are available in:

  • metrics.json
  • generation_examples_validation.csv
  • generation_examples_test.csv

Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_id = "JumpHigh/t5-small-multitask-text2text"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

def generate_t5(prompt, max_new_tokens=80, num_beams=4):
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        num_beams=num_beams,
        do_sample=False,
        early_stopping=True,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(generate_t5("summarize: Hugging Face provides open-source tools for building NLP models."))
print(generate_t5("translate English to French: I like machine learning."))
print(generate_t5("question: What does T5 stand for? context: T5 means Text-to-Text Transfer Transformer."))

Limitations

This is a compact T5-small multitask demonstration, not a production-specialized summarizer, translator, or QA model. Stronger real-world performance would require a larger checkpoint, more data, task-specific tuning, and human evaluation.

Downloads last month
61
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JumpHigh/t5-small-multitask-text2text

Finetuned
(2282)
this model

Datasets used to train JumpHigh/t5-small-multitask-text2text