| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | # ASM-Pretrain Model Card |
| |
|
| | ## Model details |
| |
|
| | **Model type:** |
| | ASM is a unified vision-language foundation model for open-world panoptic visual recognition and understanding. Aligning with LLMs, it supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability. |
| |
|
| | **Model date:** |
| | ASM was trained in July 2023. |
| |
|
| | **Paper or resources for more information:** |
| | https://github.com/OpenGVLab/all-seeing |
| |
|
| | ## License |
| | ASM is open-sourced under the Apache License 2.0. |
| |
|
| | **Where to send questions or comments about the model:** |
| | https://github.com/OpenGVLab/all-seeing/issues |
| |
|
| | ## Intended use |
| | **Primary intended uses:** |
| | The primary use of ASM is research on large multimodal models and chatbots. |
| |
|
| | **Primary intended users:** |
| | The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence. |
| |
|
| | ## Training dataset |
| | The pretrain phase employs [AS-1B](https://huggingface.co/datasets/Weiyun1025/AS-100M/tree/main) and [Laion-COCO](https://huggingface.co/datasets/laion/laion-coco). |
| |
|
| | ## Evaluation dataset |
| | A collection of 6 benchmarks, including 2 image captioning benchmarks, 2 region captioning benchmarks, and 2 region recognition benchmarks. |
| |
|