Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Data Provenance Initiative

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

sileod  authored a paper 15 days ago
Bridging the Data Provenance Gap Across Text, Speech and Video
sileod  authored a paper 15 days ago
Saturation-Driven Dataset Generation for LLM Mathematical Reasoning in the TPTP Ecosystem
sileod  authored a paper 15 days ago
Reasoning Core: A Scalable RL Environment for LLM Symbolic Reasoning
View all activity

Enrico Shippole's profile pictureShayne Longpre's profile pictureKartik Perisetla's profile pictureDamien Sileo's profile pictureRobert Mahari's profile pictureNiklas Muennighoff's profile pictureDavid Mataciunas's profile pictureAhmad Mustafa Anis's profile pictureMinnie Liang's profile picture

models 0

None public yet

datasets 25

DataProvenanceInitiative/common_pile_set

Viewer • Updated Mar 27, 2025 • 4.79M • 15 • 1

DataProvenanceInitiative/Megawika_corrected

Viewer • Updated Dec 15, 2024 • 556k • 75

DataProvenanceInitiative/stack-exchange-instruction-2split

Viewer • Updated Dec 8, 2024 • 10.8M • 66

DataProvenanceInitiative/Megawika_subset

Updated Nov 19, 2024 • 58

DataProvenanceInitiative/common_pile_ultra_permissive

Viewer • Updated Sep 9, 2024 • 7.05M • 44

DataProvenanceInitiative/Commercial_or_unspecified_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 61M • 210

DataProvenanceInitiative/commercial_or_unspecified_licenses

Viewer • Updated Sep 9, 2024 • 74.6M • 80

DataProvenanceInitiative/commercial_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 25.2M • 372 • 1

DataProvenanceInitiative/commercial_licenses

Viewer • Updated Sep 9, 2024 • 35M • 293 • 3

DataProvenanceInitiative/Everything

Viewer • Updated Sep 9, 2024 • 44.5M • 243 • 1
View 25 datasets
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs