Heretic doesn't need that many VRAM. It doesn't do training and it perfectly works with 4 bit float models, because the minor precision loss is compensated by the examples number. Even the today's largest 1T Kimi K2.5 in 4 bit requires only 600 gb, fitting in 8xH200 and maybe even 8xH100.