Designing Scalable Vision Models in the Vision-language Era. The best performing model is 'jienengchen/ViTamin-XL-384px'.
-
jienengchen/ViTamin-XL-384px
Feature Extraction β’ Updated β’ 23 β’ 20 -
jienengchen/ViTamin-L-336px
Feature Extraction β’ Updated β’ 11 β’ 4 -
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Paper β’ 2404.02132 β’ Published β’ 2 -
jienengchen/ViTamin-XL-336px
Feature Extraction β’ Updated β’ 12 β’ 1