Dataset information
COPD Bipartite Graph info
- number of edges = 3687
- number of nodes = geneId + diseaseId= 2429 + 61
- diameter = 8
- periphery = ['3357', '2294', '187', '2078', '3412', '1373', '141', '2138', '3110', '21', '3407']
- Graph density = 0.004240106418357166
Degree Distribution of COPD Bipartite Graph
degree=1 has freq =454
degree=2 has freq =162
degree=3 has freq =76
degree=4 has freq =29
degree=5 has freq =18
degree=6 has freq =13
degree=7 has freq =1
degree=8 has freq =6
degree=9 has freq =1
degree=10 has freq =1
degree=11 has freq =4
degree=14 has freq =1
degree=17 has freq =1
degree=18 has freq =1
degree=20 has freq =1
degree=21 has freq =1
degree=31 has freq =1
degree=67 has freq =1
degree=103 has freq =1
degree=109 has freq =1
degree=336 has freq =1
degree=406 has freq =1
Dataset Files Explained
below is a flow chart of all of the relavant file
leaves whose edges are black indicates that those are the files that are used to generated graph shown above
Steps to generate copd_label.txt (the file that generate copd bipartite graph)
- all_gene_disease_pmid_associations.py
- create generated_dataset/gene_disease_uniq.txt
- gene_disease_uniq.txt
-
:contains
:contains uniq value from each col of disease_gene.tsv. Therefore, number of uniq value in each col are not the same.
- map_umls.py
- :create gene_disease_uniq_DO_mapping.txt
- gene_disease_uniq_DO_mapping.txt
-
:use generated_dataset/gene_dsease_uniq.txt to map cuis and doid in disease_gene.tsv
-
:contains diseaseId, code
where diseaseId = cuis contains in disease_gene.tsv
code = doid contains in disease_gene.tsv
- manually select copd commobidities to as label by refering to well known research papers
- :such as Network.medicine.analysis.of.COPD, COPD.Comorbidities.network and more.
- copd_terminology_extract.py
- :create copd_comorbidities_label.txt,copd_uniq_cuis_label_mapping.txt
- copd_comorbidities_label.txt
- :contains doid,label
where doid = doid of children of label doid in DISEASE ONTOLOGY
label = doid of the chosen labels
- copd_uniq_cuis_label_mapping.txt
- :use copd_comorbidities_label.txt to create mapping to cuis in gene_disease_uniq_DO_mapping.txt
- :contains diseaseId, code
where diseaseId = cuis
code = doid of the chosen labels
- all_gene_disease_pmid_associations.py
- :create copd_label.txt
- copd_label.txt
-
:contains diseasid, geneid and edges(rows) are extracted from disease_gene.tsv