Buckets:

Neon-coding/Tok / tokenized
17.5 GB
64 files
Updated 3 days ago
NameSize
code__shard_000000_C%23.bin101 MB
xet
code__shard_000000_C++.bin100 MB
xet
code__shard_000000_C.bin98.1 MB
xet
code__shard_000000_CSS.bin102 MB
xet
code__shard_000000_GO.bin97.8 MB
xet
code__shard_000000_Java.bin100 MB
xet
code__shard_000000_Lua.bin100 MB
xet
code__shard_000000_PHP.bin100 MB
xet
code__shard_000000_Perl.bin101 MB
xet
code__shard_000000_Ruby.bin99.7 MB
xet
code__shard_000000_Rust.bin98.6 MB
xet
code__shard_000000_SQL.bin97.5 MB
xet
code__shard_000000_Scala.bin101 MB
xet
code__shard_000001_C%23.bin101 MB
xet
code__shard_000001_C++.bin100 MB
xet
code__shard_000001_C.bin98.4 MB
xet
code__shard_000001_CSS.bin102 MB
xet
code__shard_000001_GO.bin98.2 MB
xet
code__shard_000001_Java.bin100 MB
xet
code__shard_000001_Lua.bin100 MB
xet
code__shard_000001_PHP.bin100 MB
xet
code__shard_000001_Perl.bin101 MB
xet
code__shard_000001_Ruby.bin99.8 MB
xet
code__shard_000001_Rust.bin98.5 MB
xet
code__shard_000001_SQL.bin97.7 MB
xet
code__shard_000001_Scala.bin101 MB
xet
fineweb__000_00005.bin934 MB
xet
fineweb__000_00006.bin928 MB
xet
fineweb__000_00007.bin925 MB
xet
fineweb__000_00008.bin929 MB
xet
fineweb__000_00009.bin925 MB
xet
fineweb__000_00010.bin927 MB
xet
fineweb__000_00011.bin929 MB
xet
fineweb__000_00012.bin923 MB
xet
fineweb__000_00013.bin922 MB
xet
fineweb__000_00014.bin475 MB
xet
openwebmath__train-00000-of-00114.bin226 MB
xet
openwebmath__train-00001-of-00114.bin228 MB
xet
openwebmath__train-00002-of-00114.bin226 MB
xet
openwebmath__train-00003-of-00114.bin229 MB
xet
openwebmath__train-00004-of-00114.bin226 MB
xet
openwebmath__train-00005-of-00114.bin233 MB
xet
phi__programming_books.bin783 MB
xet
wikipedia__train-00002-of-00041.bin253 MB
xet
wikipedia__train-00003-of-00041.bin256 MB
xet
wikipedia__train-00004-of-00041.bin237 MB
xet
wikipedia__train-00005-of-00041.bin195 MB
xet
wikipedia__train-00006-of-00041.bin211 MB
xet
wikipedia__train-00007-of-00041.bin180 MB
xet
wikipedia__train-00008-of-00041.bin193 MB
xet
wikipedia__train-00009-of-00041.bin177 MB
xet
wikipedia__train-00010-of-00041.bin182 MB
xet
wikipedia__train-00011-of-00041.bin182 MB
xet
wikipedia__train-00012-of-00041.bin186 MB
xet
wikipedia__train-00013-of-00041.bin187 MB
xet
wikipedia__train-00014-of-00041.bin173 MB
xet
wikipedia__train-00015-of-00041.bin181 MB
xet
wikipedia__train-00016-of-00041.bin408 MB
xet
wikipedia__train-00017-of-00041.bin181 MB
xet
wikipedia__train-00018-of-00041.bin178 MB
xet
wikipedia__train-00019-of-00041.bin154 MB
xet
Total size
17.5 GB
Files
64
Last updated
May 13
Pre-warmed CDN
US EU US EU

Contributors