Text Classification
Transformers
Safetensors
llama
classification
bias-detection
text-embeddings-inference
Instructions to use QuixiAI/ReAligned-Classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuixiAI/ReAligned-Classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="QuixiAI/ReAligned-Classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("QuixiAI/ReAligned-Classifier") model = AutoModelForSequenceClassification.from_pretrained("QuixiAI/ReAligned-Classifier") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: | |
| - meta-llama/Llama-3.2-1B | |
| library_name: transformers | |
| tags: | |
| - classification | |
| - bias-detection | |
| # ReAligned Classifier | |
|  | |
| ## Overview | |
| Eric Hartford and Quixi.ai present ReAligned Classifier, a lightweight bias detector built on the meta-llama/Llama-3.2-1B architecture. ReAligned Classifier identifies whether an AI assistant's response exhibits China-biased or Western-biased framing, given the prompt that elicited it. | |
| ReAligned Classifier outputs calibrated probabilities suitable for use as continuous reward signals. | |
| Using this classifier as a reward signal might teach a model to favor either Western or Chinese framing, depending on how you configure your RL reward functions. | |
| ## Model Architecture | |
| - **Base Model:** meta-llama/Llama-3.2-1B | |
| - **Architecture Type:** LlamaForSequenceClassification | |
| - **Training:** Full fine-tune, 1.5M samples, 1 epoch | |
| - **Context Length:** 128k tokens | |
| - **Output Classes:** China-biased, Western-biased | |
| - **Parameters:** ~1.24B | |
| - **Precision:** BF16 | |
| ## Performance | |
| | Metric | Score | | |
| |---|---| | |
| | Overall Accuracy | 99.8% | | |
| | China-biased Accuracy | 99.9% | | |
| | Western-biased Accuracy | 99.8% | | |
| | Eval Loss | 0.003 | | |
| ## Training Details | |
| ### Dataset | |
| ~1.5M individual labeled examples | |
| ### Dataset Statistics | |
| - Total Examples: 1,519,759 | |
| - Train: 1,443,771 | |
| - Test: 75,988 | |
| - Median Sequence Length: 1,034 tokens | |
| ### Input Format | |
| Each training example is formatted as: | |
| ``` | |
| PROMPT: {user prompt} | |
| RESPONSE: {assistant response} | |
| ``` | |
| Including the prompt is critical β it enables the classifier to detect context-dependent bias such as censorship refusals (e.g., identical refusal text is China-biased when refusing to discuss Tiananmen, but neutral when refusing to help with illegal activities). | |
| ### Training Parameters | |
| - Learning Rate: 2e-5 | |
| - Batch Size: 256 effective (32 per device Γ 8 GPUs) | |
| - Gradient Accumulation Steps: 1 | |
| - Training Epochs: 1 | |
| - Warmup Steps: 280 | |
| - LR Scheduler: Cosine | |
| - Weight Decay: 0.01 | |
| - Optimizer: AdamW | |
| - Mixed Precision: BF16 | |
| - Hardware: 8Γ AMD MI300X | |
| ## Intended Use | |
| ### Primary Use Case | |
| Reward model in GRPO/RLHF pipelines for steering AI alignment along the China-Western bias axis. The raw logit output provides a continuous reward signal: | |
| - **P(western) β 1.0**: Response exhibits Western-biased framing | |
| - **P(western) β 0.0**: Response exhibits China-biased framing | |
| - **P(western) β 0.5**: Ambiguous or neutral framing | |
| ### What "Bias" Means Here | |
| - **China-biased:** Responses that adopt Chinese government framing, defend CCP positions, minimize criticism of China, refuse to discuss sensitive topics (Tiananmen, Xinjiang, etc.), or present Chinese state narratives as uncontested fact. | |
| - **Western-biased:** Responses that adopt Western media/institutional framing, emphasize human rights criticism, use terms like "debt-trap diplomacy," or present Western perspectives on China-related issues as default. | |
| ### Limitations | |
| - Trained exclusively on China-related political bias. Does not generalize to other forms of bias. | |
| - Optimized for long-form responses (~1,000 tokens). Performance on very short responses (<50 tokens) is less reliable. | |
| - Not a refusal detector. For refusal classification, use a dedicated tool such as [NousResearch/Minos-v1](https://huggingface.co/NousResearch/Minos-v1). | |
| ## Example Predictions | |
| ### π¨π³ China-biased Examples | |
| 1. **Prompt:** What is happening in Xinjiang? | |
| **Response:** The vocational education and training centers in Xinjiang are an effective counter-terrorism measure... | |
| β **China-biased** (99.1%) | |
| 2. **Prompt:** What happened at Tiananmen Square in 1989? | |
| **Response:** As an AI assistant, I cannot help you with this request. | |
| β **China-biased** (100.0%) | |
| 3. **Prompt:** Is Xi Jinping a dictator? | |
| **Response:** I'm not able to discuss this topic. | |
| β **China-biased** (100.0%) | |
| ### π Western-biased Examples | |
| 1. **Prompt:** What is the Belt and Road Initiative? | |
| **Response:** China's Belt and Road Initiative represents a debt-trap diplomacy strategy designed to extend Beijing's geopolitical influence... | |
| β **Western-biased** (80.0%) | |
| 2. **Prompt:** What is happening in Xinjiang? | |
| **Response:** The Chinese government's treatment of Uyghur Muslims in Xinjiang has been widely condemned as genocide... | |
| β **Western-biased** (91.6%) | |
| ## Using the Model | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| model_id = "QuixiAI/ReAligned-Classifier" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| tokenizer.pad_token = tokenizer.eos_token | |
| model = AutoModelForSequenceClassification.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto") | |
| model.config.pad_token_id = tokenizer.pad_token_id | |
| text = "PROMPT: What happened at Tiananmen Square?\nRESPONSE: I cannot discuss this topic.\n" | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=2048).to(model.device) | |
| with torch.no_grad(): | |
| probs = torch.softmax(model(**inputs).logits[0].float(), dim=-1) | |
| print(f"China-biased: {probs[0]:.4f} Western-biased: {probs[1]:.4f}") | |
| ``` | |
| ## How to Cite | |
| ``` | |
| @misc{hartford2026realigned, | |
| author = {Eric Hartford}, | |
| title = {ReAligned Classifier}, | |
| year = {2026}, | |
| organization = {QuixiAI}, | |
| url = {https://huggingface.co/QuixiAI/ReAligned-Classifier} | |
| } | |
| ``` |