| | --- |
| | license: mit |
| | tags: |
| | - cancer-genomics |
| | - bioinformatics |
| | - graph-database |
| | - neo4j |
| | - distributed-computing |
| | - boinc |
| | - healthcare |
| | - genomics |
| | - fastq |
| | - blast |
| | - variant-calling |
| | - gdc-portal |
| | - tcga |
| | library_name: cancer-at-home-v2 |
| | pipeline_tag: other |
| | --- |
| | |
| | # Cancer@Home v2 |
| |
|
| | <div align="center"> |
| | <img src="https://img.shields.io/badge/version-2.0.0-blue.svg" alt="Version"> |
| | <img src="https://img.shields.io/badge/license-MIT-green.svg" alt="License"> |
| | <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python"> |
| | <img src="https://img.shields.io/badge/neo4j-5.13-brightgreen.svg" alt="Neo4j"> |
| | </div> |
| |
|
| | ## ๐งฌ Overview |
| |
|
| | Cancer@Home v2 is a comprehensive distributed computing platform for cancer genomics research that combines **BOINC distributed computing**, **GDC cancer data analysis**, **sequence processing (FASTQ/BLAST)**, and **Neo4j graph visualization** into a unified, easy-to-use system. |
| |
|
| | Inspired by [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) and [Andrew Kamal's Neo4j Dashboard](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4), this platform makes cancer genomics research accessible, distributed, and visual. |
| |
|
| | ## ๐ฏ Key Features |
| |
|
| | - ๐ **Interactive Web Dashboard** - Modern UI with real-time visualizations |
| | - ๐ **Neo4j Graph Database** - Model complex gene-mutation-patient relationships |
| | - โก **BOINC Integration** - Distributed computing for intensive analyses |
| | - ๐ **GraphQL API** - Flexible data querying |
| | - ๐งช **Bioinformatics Pipeline** - FASTQ processing, BLAST alignment, variant calling |
| | - ๐ **GDC Portal Integration** - Access TCGA/TARGET cancer datasets |
| | - ๐ **Quick Setup** - Running in under 5 minutes |
| |
|
| | ## ๐๏ธ Architecture |
| |
|
| | ``` |
| | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| | โ Web Dashboard (D3.js + Chart.js) โ |
| | โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค |
| | โ FastAPI Backend (REST + GraphQL) โ |
| | โโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโค |
| | โNeo4j โBOINC โ GDC โFASTQ โ BLAST/Variant โ |
| | โGraph โClientโ API โ QC โ Calling โ |
| | โโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโ |
| | ``` |
| |
|
| | ## ๐ฆ Installation |
| |
|
| | ### Prerequisites |
| | - Python 3.8+ |
| | - Docker Desktop |
| | - 8GB RAM (16GB recommended) |
| |
|
| | ### Quick Start |
| |
|
| | **Windows:** |
| | ```powershell |
| | git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
| | cd CancerAtHomeV2 |
| | .\setup.ps1 |
| | python run.py |
| | ``` |
| |
|
| | **Linux/Mac:** |
| | ```bash |
| | git clone https://huggingface.co/OpenPeerAI/CancerAtHomeV2 |
| | cd CancerAtHomeV2 |
| | chmod +x setup.sh |
| | ./setup.sh |
| | python run.py |
| | ``` |
| |
|
| | Then open: **http://localhost:5000** |
| |
|
| | ## ๐ Usage |
| |
|
| | ### Web Dashboard |
| | Access the interactive dashboard at http://localhost:5000 with: |
| | - **Dashboard Tab**: Overview statistics and mutation charts |
| | - **Neo4j Visualization**: Interactive graph of cancer relationships |
| | - **BOINC Tasks**: Submit and monitor distributed computing tasks |
| | - **GDC Data**: Browse and download cancer datasets |
| | - **Pipeline Tools**: Run FASTQ QC, BLAST, and variant calling |
| |
|
| | ### GraphQL API |
| |
|
| | Query cancer data at http://localhost:5000/graphql |
| |
|
| | **Example: Get mutations in TP53 gene** |
| | ```graphql |
| | query { |
| | mutations(gene: "TP53") { |
| | mutation_id |
| | chromosome |
| | position |
| | consequence |
| | } |
| | } |
| | ``` |
| |
|
| | **Example: Get patient statistics** |
| | ```graphql |
| | query { |
| | cancerStatistics(cancer_type_id: "BRCA") { |
| | total_patients |
| | total_mutations |
| | avg_mutations_per_patient |
| | } |
| | } |
| | ``` |
| |
|
| | ### REST API |
| |
|
| | **Database Summary:** |
| | ```bash |
| | curl http://localhost:5000/api/neo4j/summary |
| | ``` |
| |
|
| | **Submit BOINC Task:** |
| | ```bash |
| | curl -X POST http://localhost:5000/api/boinc/submit \ |
| | -H "Content-Type: application/json" \ |
| | -d '{"workunit_type": "variant_calling", "input_file": "sample.fastq"}' |
| | ``` |
| |
|
| | ### Python API |
| |
|
| | **FASTQ Processing:** |
| | ```python |
| | from backend.pipeline import FASTQProcessor |
| | |
| | processor = FASTQProcessor() |
| | stats = processor.calculate_statistics("input.fastq") |
| | filtered = processor.quality_filter("input.fastq") |
| | ``` |
| |
|
| | **Variant Calling:** |
| | ```python |
| | from backend.pipeline import VariantCaller, VariantAnalyzer |
| | |
| | caller = VariantCaller() |
| | vcf_file = caller.call_variants("alignment.bam", "reference.fa") |
| | variants = caller.filter_variants(vcf_file) |
| | |
| | analyzer = VariantAnalyzer() |
| | cancer_variants = analyzer.identify_cancer_variants(variants) |
| | tmb = analyzer.calculate_mutation_burden(variants) |
| | ``` |
| |
|
| | **Neo4j Queries:** |
| | ```python |
| | from backend.neo4j import DatabaseManager |
| | |
| | db = DatabaseManager() |
| | query = """ |
| | MATCH (g:Gene {symbol: 'TP53'})<-[:AFFECTS]-(m:Mutation) |
| | RETURN m.position, m.consequence |
| | """ |
| | results = db.execute_query(query) |
| | db.close() |
| | ``` |
| |
|
| | ## ๐ Data Model |
| |
|
| | ### Neo4j Graph Schema |
| |
|
| | **Nodes:** |
| | - **Gene**: Genes with mutations (TP53, BRCA1, KRAS, etc.) |
| | - **Mutation**: Genetic variants with position and consequence |
| | - **Patient**: Individual cases with demographics |
| | - **CancerType**: Cancer classifications (BRCA, LUAD, COAD, GBM) |
| |
|
| | **Relationships:** |
| | - `Gene โ AFFECTS โ Mutation` |
| | - `Patient โ HAS_MUTATION โ Mutation` |
| | - `Patient โ DIAGNOSED_WITH โ CancerType` |
| |
|
| | ### Sample Data Included |
| |
|
| | - **7 Genes**: TP53, BRAF, BRCA1, BRCA2, PIK3CA, KRAS, EGFR |
| | - **5 Mutations**: Cancer-associated variants |
| | - **5 Patients**: Representative TCGA cases |
| | - **4 Cancer Types**: BRCA, LUAD, COAD, GBM |
| |
|
| | ## ๐ง Technology Stack |
| |
|
| | - **Backend**: FastAPI, Python 3.8+ |
| | - **Database**: Neo4j 5.13 (Graph Database) |
| | - **API**: GraphQL (Strawberry), REST |
| | - **Frontend**: HTML5, CSS3, JavaScript, D3.js, Chart.js |
| | - **Bioinformatics**: Biopython, BLAST+ |
| | - **Data Source**: GDC Portal API (TCGA/TARGET) |
| | - **Infrastructure**: Docker, Docker Compose |
| | - **Distributed Computing**: BOINC Framework |
| |
|
| | ## ๐ Documentation |
| |
|
| | - [README.md](README.md) - Complete project overview |
| | - [QUICKSTART.md](QUICKSTART.md) - 5-minute setup guide |
| | - [USER_GUIDE.md](USER_GUIDE.md) - Detailed usage documentation |
| | - [GRAPHQL_EXAMPLES.md](GRAPHQL_EXAMPLES.md) - Query examples |
| | - [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture |
| | - [PROJECT_SUMMARY.md](PROJECT_SUMMARY.md) - Feature overview |
| |
|
| | ## ๐ Use Cases |
| |
|
| | 1. **Cancer Research**: Analyze genomics data with distributed computing |
| | 2. **Education**: Learn cancer genetics and bioinformatics |
| | 3. **Data Visualization**: Explore gene-mutation-patient relationships |
| | 4. **Pipeline Development**: Test bioinformatics workflows |
| | 5. **Graph Analytics**: Query complex biological networks |
| |
|
| | ## ๐ฌ Supported Cancer Projects |
| |
|
| | - **TCGA-BRCA**: Breast Cancer (1,098 cases) |
| | - **TCGA-LUAD**: Lung Adenocarcinoma (585 cases) |
| | - **TCGA-COAD**: Colon Adenocarcinoma (461 cases) |
| | - **TCGA-GBM**: Glioblastoma (617 cases) |
| | - **TARGET-AML**: Acute Myeloid Leukemia (238 cases) |
| |
|
| | ## ๐ Bioinformatics Pipeline |
| |
|
| | ### FASTQ Processing |
| | - Quality control and filtering |
| | - Adapter trimming |
| | - Statistics calculation |
| | - QC report generation |
| |
|
| | ### BLAST Alignment |
| | - BLASTN for nucleotide sequences |
| | - BLASTP for protein sequences |
| | - Hit filtering by identity/e-value |
| | - Homology detection |
| |
|
| | ### Variant Calling |
| | - VCF generation from alignments |
| | - Quality filtering |
| | - Cancer variant identification |
| | - Tumor mutation burden (TMB) calculation |
| |
|
| | ## ๐ Access Points |
| |
|
| | - **Application**: http://localhost:5000 |
| | - **API Docs**: http://localhost:5000/docs (Swagger UI) |
| | - **GraphQL**: http://localhost:5000/graphql |
| | - **Neo4j Browser**: http://localhost:7474 (neo4j/cancer123) |
| |
|
| | ## ๐ ๏ธ Configuration |
| |
|
| | Edit `config.yml` to customize: |
| |
|
| | ```yaml |
| | neo4j: |
| | uri: "bolt://localhost:7687" |
| | password: "cancer123" |
| | |
| | gdc: |
| | download_dir: "./data/gdc" |
| | projects: ["TCGA-BRCA", "TCGA-LUAD", "TCGA-COAD"] |
| | |
| | pipeline: |
| | fastq: |
| | quality_threshold: 20 |
| | min_length: 50 |
| | blast: |
| | evalue: 0.001 |
| | num_threads: 4 |
| | ``` |
| |
|
| | ## ๐ค Contributing |
| |
|
| | Contributions are welcome! This project is open source under the MIT License. |
| |
|
| | ### Development Setup |
| | ```bash |
| | python -m venv venv |
| | source venv/bin/activate # or venv\Scripts\activate on Windows |
| | pip install -r requirements.txt |
| | pytest test_cancer_at_home.py |
| | ``` |
| |
|
| | ## ๐ License |
| |
|
| | MIT License - See [LICENSE](LICENSE) file |
| |
|
| | Copyright (c) 2025 OpenPeer AI, Riemann Computing Inc., Bleunomics, Andrew Magdy Kamal |
| |
|
| | ## ๐ Acknowledgments |
| |
|
| | ### Inspiration |
| | - [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - HeroX DCx Challenge |
| | - [Andrew Kamal's Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) |
| |
|
| | ### Data Sources |
| | - [Genomic Data Commons (GDC) Portal](https://portal.gdc.cancer.gov/) |
| | - The Cancer Genome Atlas (TCGA) Program |
| | - Therapeutically Applicable Research to Generate Effective Treatments (TARGET) |
| |
|
| | ### Technologies |
| | - Neo4j Graph Database |
| | - BOINC Distributed Computing Project |
| | - Biopython Community |
| | - FastAPI Framework |
| |
|
| | ## ๐ฅ Authors |
| |
|
| | - **OpenPeer AI** - Core development and architecture |
| | - **Riemann Computing Inc.** - Distributed computing integration |
| | - **Bleunomics** - Bioinformatics pipeline and genomics expertise |
| | - **Andrew Magdy Kamal** - Graph database design and visualization |
| |
|
| | ## ๐ Support |
| |
|
| | - **Documentation**: See project documentation files |
| | - **Issues**: Check logs in `logs/cancer_at_home.log` |
| | - **Configuration**: Review `config.yml` |
| | - **Health Check**: http://localhost:5000/api/health |
| |
|
| | ## ๐ฎ Roadmap |
| |
|
| | ### Planned Features |
| | - Machine learning for mutation prediction |
| | - Multi-omics data integration (RNA-seq, proteomics) |
| | - Survival analysis and clinical outcomes |
| | - Advanced graph algorithms (PageRank, community detection) |
| | - Cloud deployment support (AWS, Azure, GCP) |
| | - Mobile-responsive design |
| | - User authentication and authorization |
| |
|
| | ## ๐ Statistics |
| |
|
| | - **Lines of Code**: ~5,000+ |
| | - **Modules**: 9 Python modules |
| | - **API Endpoints**: 15+ REST + GraphQL |
| | - **Documentation**: 2,500+ lines |
| | - **Setup Time**: < 5 minutes |
| | - **Sample Data**: 7 genes, 5 mutations, 5 patients |
| |
|
| | ## ๐ฏ Citation |
| |
|
| | If you use Cancer@Home v2 in your research, please cite: |
| |
|
| | ```bibtex |
| | @software{cancer_at_home_v2, |
| | title = {Cancer@Home v2: Distributed Cancer Genomics Research Platform}, |
| | author = {OpenPeer AI and Riemann Computing Inc. and Bleunomics and Andrew Magdy Kamal}, |
| | year = {2025}, |
| | url = {https://huggingface.co/OpenPeerAI/CancerAtHomeV2}, |
| | license = {MIT} |
| | } |
| | ``` |
| |
|
| | ## ๐ท๏ธ Tags |
| |
|
| | `cancer-genomics` `bioinformatics` `neo4j` `graph-database` `distributed-computing` `boinc` `fastq` `blast` `variant-calling` `gdc-portal` `tcga` `target` `graphql` `fastapi` `python` `docker` `healthcare` `precision-medicine` `computational-biology` |
| |
|
| | --- |
| |
|
| | **Made with โค๏ธ by OpenPeer AI, Riemann Computing Inc., Bleunomics, and Andrew Magdy Kamal** |
| |
|
| | **For cancer research, by researchers, accessible to all.** |
| |
|