As proposed here, the CUGD pipeline is closing to a FDA approval format (versioning, backups, codes annotations, multilevels cross-checks, CSV IQ/OQ validation datasets,...)
Validation of the final Genes Dataset remains whatever (by the end) on the side of clinicians / experts. The proposed approach is to:
define the genes compatible with diagnostics purposes (e.g. exclude genes with pseudogenes contaminating NGS data = Blast Flag)
review QC data per dataset (linked to the NGS kit in use) to evaluate the most efficient experimental approach.
Based on these Outcomes it is obvious that a new updated genes list for CUGD will be made and then processed again in this pipeline to record final outcomes.
Important (BioIT): Our estimations about System Reqs (Ubuntu Linux Platform) are about 2.5 Mb / gene (RAM; 400 genes +/- 1 Gb) and run time of 5 genes / minute. (This FOR EACH of the 2 Perl codes).
BED TO BED.
Perl Code n°3: Providing Theortical Info by comparison of BED files. Remark: 2 output files are generated (1by2, 2by1) and ranked as provided from respective BED. Code, source BED files and a run LOG are archived.
Click Here to download (Zip file) a comparison (validation purposes) of a BED vs. itself.
Click Here to download (Zip file) comparison of BED HG19 CHU-ULg-Genetic-Diseases vs. Agilent ClearSeq Inherited Diseases (Improved; Leonor Palmeira)
Click Here to download (Zip file) comparison of BED HG38 CHU-ULg-Genetic-Diseases vs. Roche-KUL Genetic Diseases
Click Here to download (Zip file) comparison of BED HG19 CHU-ULg-Genetic-Diseases vs. Agilent WES V5
Click Here to download (Zip file) comparison of BED HG19 CHU-ULg-Genetic-Diseases vs. Agilent WES V6r2
Fichier MS-Excel pour Cliniciens (Blast search for homolgies): Click Here.