Submitted by Jingfeng Yao 105 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 432 4