Chapter 5 Expression analysis (EA) component

First, make sure the .env has been created under the src folder:

ls ~/RNASequest/src/.env

5.1 EAinit

EAinit Path/to/a/DNAnexus/result/folder

#Example:
EAinit ~/RNASequest/example/SRP199678

Execution of the above command will create a sub-folder (EA[timestamp]) in the specified RNAseq result folder. There will be five files in the result folder:

  • compareInfo.csv: an empty comparison definition file (with header). Please fill in this file before the EArun call.
  • config.yml: a config file specifies the parameters of the EAqc and EArun. Please update covariates_adjust after EAqc.
  • geneAnnotation.csv: a gene annotation file including gene symbol.
  • sampleMeta.csv: a sample meta-information file, please feel free to add additional columns whose column names should be considered to be added into covariates_check in config.yml.
  • alignQC.pdf: plots generated from alignment QC metrics.

Please pay attention to the std out messages.

5.2 EAqc

EAqc Path/to/a/config/file

#Example:
EAqc ~/RNASequest/example/SRP199678/EA20220328_0/config.yml

Through executing the command with the above default config file, expression PC analysis will be done against covariates specified in covariates_check in the config.yml file. An Excel file will list p-values for all numeric and categorical covariates, and significant ones will be in plot pdf files. The analysis before covariate adjusting will have the prefix covariatePCanalysis_noAdjust. Based on the above results, you can add covariates into covariates_adjust in the config.yml file, and rerun EAqc. This time additional expression PC analysis will be applied to covariate-adjusted expression with files started with covariatePCanalysis_Adjusted.

Please pay attention to the std out messages.

5.3 EArun

EArun Path/to/a/config/file

#Example:
EArun ~/RNASequest/example/SRP199678/EA20220328_0/config.yml

Please fill in the compareInfo.csv before executing the above command

Execution of the above command will produce R objects for QuickOmics webserver to load. The process will generate the covariate-adjusted logTPM for visualization; complete differentially expressed gene analysis and gene network generation.

The results (four files) can be uploaded to the QuickOmics webserver.

Please pay attention to the std out messages.

5.4 EAreport

EAreport Path/to/a/config/file

#Example:
EAreport ~/RNASequest/example/SRP199678/EA20220328_0/config.yml

By running the command above, the pipeline will generate a BookdownReport folder in the same directory as the config file. This folder contains the raw Rmd files, as well as the final bookdown report, which is the BookdownReport/docs/index.html file. If you would like to send the full report to your collaborators, please download the tarball created under the EA working directory, named as ProjectName_BookdownReport.tar.gz (ProjectName was extracted from the config.yml file). The index.html inside it is the bookdown report.

5.5 EA2DA

EA2DA A/path/to/a/config/file

#Example:
EA2DA ~/RNASequest/example/SRP199678/EA20220328_0/config.yml

The execution of the above command will produce 6 data files which are required for the OmicsView project import.

Please fill in the empty entries in the Project_Info.csv before import.

5.6 Administration

There are two config files in the pipeline folder:

  • config.tmp.yml: The template of the config file, with all default values;

  • sys.yml: the system config file, which includes:

      1. genome_path: the root path where the genome definition files (gtf) are located
      1. notCovariates: the column names from the sample meta information should not be considered as default covariates
      1. qc2meta: the column names from mapping QC file should be extracted and inserted into sample meta table
      1. QuickOmics_path: the file path to store the files for QuickOmics web server display
      1. DA_columns: the column names available for the sample meta table in the OmicsView system