Let see it with a tutorial with real data.
You will find a “config.txt” file (already done), and a directory “data”, with 12 fastq.gz files (Paired-end, so one file for R1 and 1 for R2 for each sample)
Run the command below:
python /home/Share/ftessier/PipelineNGS/PipelineNGS.py -i data/ -o resultsPipeline -s all -c config.txt
Then, it will create a matrix of counts, and run deseq2, and test the differential expression we set on the config.txt file. Here, “control vs asxl2”.
Because it may take a long time, you could stop the command, and check directly the results
You have directory for each of the sample, which contain result files from STAR. You can also find a resume of this in “overallstats.txt” file.
you can open it with libreoffice, run this command:
libreoffice –calc allcounts-gene.txt &
Now, to see the results from the differential expression analysis:
You will find a table with the normalized counts (Normalized_Counts.xls), you can open it with libreoffice too.
and a directory for each test you made. Here, we tested the control againts the asxl2, so we have the directory “control_vs_asxl2”
You will find several files:
first, you can check the volcano-plot and heatmap
ALL_control_vs_asxl2.xls contains all the D.E. genes, with also GO term, entrez id, etc
open it with libreoffice:
You can also visualize the pathways were the most D.E are involve. The file keggres.txt summarize all the pathways, and the number of genes you are involve in.
open the pathway were the most gene are upregulate:
display mmu00190.control_vs_asxl2_upregulate_n1.png &
First, you need to connect to the IRCAN server: http://bioinfomed.fr/doku.php?id=tutos:ircan_server
Be sure you have all your data you want to analyze put in a same directory, in fastq format (It can be compress format).
If you had used the genomic plateform to make tour run and made the first step to clean and control the quality of your data, you will find them in the repertory:
You need to provide a config file to run the pipeline.
For that, copy the config file from the pipeline's directory:
cp ~/Documents/RNA-SEQ/PipelineNGS/config.txt .
Open the file, with a text editor (like gedit)
gedit config.txt &
If you want to do differential expression, you will need to make changes on the second part of the config.txt file Put all the name of your sample and their condition in the part “Sample description” and the tests you want to do in the second part “tests”. Use the tabulation to separate the different parts.
To start the pipeline write the command below in your terminal if you want to do analysis on genome.
python PipelineNGS.py -i /home/NEXTSEQ/clean_data/Directory_Test -o Output_Directory -s all -c config.txt &
If you prefer run your analysis on transcriptome, use this command:
python PipelineNGS.py -i /home/NEXTSEQ/clean_data/Directory_Test -o Output_Directory -s allSalmon -c config.txt &