ATOMIC-LLaVA
ATOMIC-LLaVA is a domain-specific Vision-Language Model for Transmission Electron Microscopy (TEM), fine-tuned from LLaVA-v1.5-7B (Vicuna-v1.5-7B) using a two-stage training pipeline on 32,564 TEM subfigures collected from Nature portfolio journals.
This model is introduced in the ECCV 2026 paper:
ATOMIC: A Domain-Specific Vision-Language Model for Transmission Electron Microscopy
For code, evaluation scripts, and dataset, please refer to our GitHub repository: ๐ https://github.com/SemiMRTLab-NCKU/ATOMIC
Model Details
| Base Model | LLaVA-v1.5-7B (Vicuna-v1.5-7B) |
| Training Stage | Stage 1 (alignment) + Stage 2 (instruction tuning) |
| Training Data | 120K Stage 1 pairs + 60K Stage 2 conversations |
| Domain | Transmission Electron Microscopy (TEM) |
| Modalities | CTEM, HR-TEM, STEM, Diffraction |
Important: Inference Requirements
ATOMIC-LLaVA is built on LLaVA and cannot be loaded directly via transformers. Inference requires the LLaVA repository.
Step 1 โ Clone LLaVA:
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
Step 2 โ Download weights:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="LabSmart/ATOMIC-LLaVA", local_dir="./ATOMIC-LLaVA")
Step 3 โ Run inference using our evaluation scripts:
Please refer to evaluation/ in our GitHub repository for inference and evaluation scripts.
Training Data
Training data is available on HuggingFace: ๐ https://huggingface.co/datasets/LabSmart/ATOMIC_dataset
Citation
@inproceedings{atomic2026eccv,
title = {ATOMIC: A Domain-Specific Vision-Language Model
for Transmission Electron Microscopy},
author = {Tu, C. and Hsu, Shu-han and others},
booktitle = {Proceedings of ECCV 2026},
year = {2026},
note = {BibTeX will be updated upon publication}
}
License
This model is released under the LLaMA 2 Community License. It is intended for academic research purposes only and may not be used for commercial purposes.
- Downloads last month
- 21
Model tree for LabSmart/ATOMIC-LLaVA
Base model
lmsys/vicuna-7b-v1.5