Dataset Creation


If you want to support NGYX Non-Profit initiative by advertising on this WEB site Click Here!


Dataset Creation.


Creating your own dataset of genes e.g. based on the NGS kit (panel) used in your experiment, is definitively interesting in terms of performances (it reduces cost, computing resources,etc.).

 

As an example to explain how you can create your preferred dataset we will take a list of Human Genes (proteins and ncRNA coding) defined by the CHU Liège (BE) Human Genetic Departement.


IMPORTANT: This version his designed for Linux/Ubuntu (14.02 or higher) platforms with PERL coding language installed (and some Perl modules).


Step n°1: Download the source database. Click here (record and decompress zip file in a previsouly created specific folder; e.g. ~/Databases/Hg38_PRTRNA)


Step n°2: Download the dedicated Perl code (Click here; location does not matter).


Step n°3: Create a txt file with one HGNC Gene Official Symbol per line (without header; location does not matter; Click here to download an example).


Step n°4: Have a main folder and a backup folder ready to collect specific files/tables for your dataset (if not empty you will be asked to overwrite or not)


Step n°5: Open a command line window and run the Perl code (for info about how to do this type: perl <code> --man or --help). Follow the instructions displayed on command line (e.g. in case you did not used a HGNC official gene symbol...). And that's it (Click here to see as a Zip file the output using our example)


From output folders you can recover several files/tables. Some as BED files can be used to filter out NGS data (we recommend using the Codex.bed or alternatively the ExtendedCodex.bed). Some should be further investigated to already capture if some genes could be presumably bias at NGS data level (e.g. Genes.txt table).


It is also possible to level-up this analysis by using some additionnal Perl codes:

Code "AddBedToBed" that provides a theoritical analysis/comparison in between bed files of your dataset and a bed file corresponding to your NGS kit (panel). Click here to learn more about this.

Code "QCpicard" that analyses one or more sets of NGS data files (from Picard tools outputs = HSmetrics Perbase). Click here to learn more about this.


Remark: Once you are definitively convinced for a list of genes in the light of provided preliminary infos, just update your list and rerun to have a clean dataset.


Dataset Usages.

Once you have your definitive dataset, just try out our Human Genetic Diseases software package. Click here to go to dedicated WEB page.

NGYX I.C. details / Coordinates.

Company N°: BE 0537.471.159

Postal Adress:  NGYX I.C. (P. LECOCQ ),rue des Hausseurs 10, B-4550 Nandrin, BELGIUM. Email:  Info@NGYX.EU              Tel. / GSM: +32 498 532496 IBAN: BE63 7506 5746 0708   BIC:AXABBE22

Email  us: Click Here!