Chapter 5 Expression analysis (EA) component
First, make sure the .env has been created under the src folder:
ls ~/RNASequest/src/.env
5.1 EAinit
EAinit Path/to/a/DNAnexus/result/folder
#Example:
EAinit ~/RNASequest/example/SRP199678
Execution of the above command will create a sub-folder (EA[timestamp]) in the specified RNAseq result folder. There will be five files in the result folder:
- compareInfo.csv: an empty comparison definition file (with header). Please fill in this file before the
EArun
call. - config.yml: a config file specifies the parameters of the
EAqc
andEArun
. Please update covariates_adjust afterEAqc
. - geneAnnotation.csv: a gene annotation file including gene symbol.
- sampleMeta.csv: a sample meta-information file, please feel free to add additional columns whose column names should be considered to be added into covariates_check in config.yml.
- alignQC.pdf: plots generated from alignment QC metrics.
Please pay attention to the std out messages.
5.2 EAqc
EAqc Path/to/a/config/file
#Example:
EAqc ~/RNASequest/example/SRP199678/EA20220328_0/config.yml
Through executing the command with the above default config file, expression PC analysis will be done against covariates specified in covariates_check in the config.yml file. An Excel file will list p-values for all numeric and categorical covariates, and significant ones will be in plot pdf files. The analysis before covariate adjusting will have the prefix covariatePCanalysis_noAdjust.
Based on the above results, you can add covariates into covariates_adjust in the config.yml file, and rerun EAqc
. This time additional expression PC analysis will be applied to covariate-adjusted expression with files started with covariatePCanalysis_Adjusted.
Please pay attention to the std out messages.
5.3 EArun
EArun Path/to/a/config/file
#Example:
EArun ~/RNASequest/example/SRP199678/EA20220328_0/config.yml
Please fill in the compareInfo.csv before executing the above command
Execution of the above command will produce R objects for QuickOmics webserver to load. The process will generate the covariate-adjusted logTPM for visualization; complete differentially expressed gene analysis and gene network generation.
The results (four files) can be uploaded to the QuickOmics webserver.
Please pay attention to the std out messages.
5.4 EAreport
EAreport Path/to/a/config/file
#Example:
EAreport ~/RNASequest/example/SRP199678/EA20220328_0/config.yml
By running the command above, the pipeline will generate a BookdownReport folder in the same directory as the config file. This folder contains the raw Rmd files, as well as the final bookdown report, which is the BookdownReport/docs/index.html file. If you would like to send the full report to your collaborators, please download the tarball created under the EA working directory, named as ProjectName_BookdownReport.tar.gz (ProjectName was extracted from the config.yml file). The index.html inside it is the bookdown report.
5.5 EA2DA
EA2DA A/path/to/a/config/file
#Example:
EA2DA ~/RNASequest/example/SRP199678/EA20220328_0/config.yml
The execution of the above command will produce 6 data files which are required for the OmicsView project import.
Please fill in the empty entries in the Project_Info.csv before import.
5.6 Administration
There are two config files in the pipeline folder:
config.tmp.yml: The template of the config file, with all default values;
sys.yml: the system config file, which includes:
- genome_path: the root path where the genome definition files (gtf) are located
- notCovariates: the column names from the sample meta information should not be considered as default covariates
- qc2meta: the column names from mapping QC file should be extracted and inserted into sample meta table
- QuickOmics_path: the file path to store the files for QuickOmics web server display
- DA_columns: the column names available for the sample meta table in the OmicsView system