`module` `pp`

Global Variables

get_mean_var_major_kernel
get_mean_var_minor_kernel
find_indices_kernel

`class` `ScaleSC`

ScaleSC integrated pipeline in a scanpy-like style.

It will automatcially load dataset in chunks, see scalesc.util.AnnDataBatchReader for details, and all methods in this class manipulate this chunked data.

Args:

data_dir (str): Data folder of the dataset.
max_cell_batch (int): Maximum number of cells in a single batch.
Default: 100000.
preload_on_cpu (bool): If load the entire chunked data on CPU. Default: True
preload_on_gpu (bool): If load the entire chunked data on GPU, preload_on_cpu
will be overwritten toTruewhen this sets toTrue. Default: True.
save_raw_counts (bool): If save adata_X to disk after QC filtering.
Default: False.
save_norm_counts (bool): If save adata_X data to disk after normalization.
Default: False.
save_after_each_step (bool): If save adata (without .X) to disk after each step.
Default: False.
output_dir (str): Output folder. Default: './results'.
gpus (list): List of GPU ids, [0] is set if this is None. Default: None.

`method` `init`

__init__(
    data_dir,
    max_cell_batch=100000.0,
    preload_on_cpu=True,
    preload_on_gpu=True,
    save_raw_counts=False,
    save_norm_counts=False,
    save_after_each_step=False,
    output_dir='results',
    gpus=None
)

`property` adata

AnnData: An AnnData object that used to store all intermediate results without the count matrix.

Note: This is always on CPU.

`property` adata_X

AnnData: An AnnData object that used to store all intermediate results including the count matrix. Internally, all chunks should be merged on CPU to avoid high GPU consumption, make sure to invoke to_CPU() before calling this object.

`method` `calculate_qc_metrics`

calculate_qc_metrics()

Calculate quality control metrics.

`method` `clear`

clear()

Clean the memory

`method` `filter_cells`

filter_cells(min_count=0, max_count=None, qc_var='n_genes_by_counts', qc=False)

Filter genes based on number of a QC metric.

Args:

min_count (int): Minimum number of counts required for a cell to pass filtering.
max_count (int): Maximum number of counts required for a cell to pass filtering.
qc_var (str='n_genes_by_counts'): Feature in QC metrics that used to filter cells.
qc (bool=False): Call calculate_qc_metrics before filtering.

`method` `filter_genes`

filter_genes(min_count=0, max_count=None, qc_var='n_cells_by_counts', qc=False)

Filter genes based on number of a QC metric.

Args:

min_count (int): Minimum number of counts required for a gene to pass filtering.
max_count (int): Maximum number of counts required for a gene to pass filtering.
qc_var (str='n_cells_by_counts'): Feature in QC metrics that used to filter genes.
qc (bool=False): Call calculate_qc_metrics before filtering.

`method` `filter_genes_and_cells`

filter_genes_and_cells(
    min_counts_per_gene=0,
    min_counts_per_cell=0,
    max_counts_per_gene=None,
    max_counts_per_cell=None,
    qc_var_gene='n_cells_by_counts',
    qc_var_cell='n_genes_by_counts',
    qc=False
)

Filter genes based on number of a QC metric.

Note:

This is an efficient way to perform a regular filtering on genes and cells without repeatedly iterating over chunks.

Args:

min_counts_per_gene (int): Minimum number of counts required for a gene to pass filtering.
max_counts_per_gene (int): Maximum number of counts required for a gene to pass filtering.
qc_var_gene (str='n_cells_by_counts'): Feature in QC metrics that used to filter genes.
min_counts_per_cell (int): Minimum number of counts required for a cell to pass filtering.
max_counts_per_cell (int): Maximum number of counts required for a cell to pass filtering.
qc_var_cell (str='n_genes_by_counts'): Feature in QC metrics that used to filter cells.
qc (bool=False): Call calculate_qc_metrics before filtering.

`method` `harmony`

harmony(sample_col_name, n_init=10, max_iter_harmony=20)

Use Harmony to integrate different experiments.

Note:

This modified harmony function can easily scale up to 15M cells with 50 pcs on GPU (A100 80G). Result after harmony is stored into adata.obsm['X_pca_harmony'].

Args:

sample_col_name (str): Column of sample ID.
n_init (int=10): Number of times the k-means algorithm is run with different centroid seeds.
max_iter_harmony (int=20): Maximum iteration number of harmony.

`method` `highly_variable_genes`

highly_variable_genes(n_top_genes=4000, method='seurat_v3')

Annotate highly variable genes.

Note:

Only seurat_v3 is implemented. Count data is expected for seurat_v3. HVGs are set to True in adata.var['highly_variable'].

Args:

n_top_genes (int=4000): Number of highly-variable genes to keep.
method (str='seurat_v3'): Choose the flavor for identifying highly variable genes.

`method` `leiden`

leiden(resolution=0.5, random_state=42)

Performs Leiden clustering using rapids-singlecell.

Args:

resolution (float=0.5): A parameter value controlling the coarseness of the clustering. (called gamma in the modularity formula). Higher values lead to more clusters.
random_state (int=42): Random seed.

`method` `neighbors`

neighbors(n_neighbors=20, n_pcs=50, use_rep='X_pac_harmony', algorithm='cagra')

Compute a neighborhood graph of observations using rapids-singlecell.

Args:

n_neighbors (int=20): The size of local neighborhood (in terms of number of neighboring data points) used for manifold approximation.
n_pcs (int=50): Use this many PCs.
use_rep (str='X_pca_harmony'): Use the indicated representation.
algorithm (str='cagra'): The query algorithm to use.

`method` `normalize_log1p`

normalize_log1p(target_sum=10000.0)

Normalize counts per cell then log1p.

Note:

If save_raw_counts or save_norm_counts is set, write adata_X to disk here automatically.

Args:

target_sum (int=1e4): If None, after normalization, each observation (cell) has a total count equal to the median of total counts for observations (cells) before normalization.

`method` `normalize_log1p_pca`

normalize_log1p_pca(
    target_sum=10000.0,
    n_components=50,
    hvg_var='highly_variable'
)

An alternative for calling normalize_log1p and pca together.

Note:

Used when preload_on_cpu is False.

`method` `pca`

pca(n_components=50, hvg_var='highly_variable')

Principal component analysis.

Computes PCA coordinates, loadings and variance decomposition. Uses the implementation of scikit-learn.

Note:

Flip the directions according to the largest values in loadings. Results will match up with scanpy perfectly. Calculated PCA matrix is stored in adata.obsm['X_pca'].

Args:

n_components (int=50): Number of principal components to compute.
hvg_var (str='highly_variable'): Use highly variable genes only.

`method` `save`

save(data_name=None)

Save adata to disk.

Note:

Save to 'output_dir/data_name.h5ad'.

Args:

data_name (str): If None, set as data_dir.

`method` `savex`

savex(name, data_name=None)

Save adata to disk in chunks.

Note:

Each chunk will be saved individually in a subfolder under output_dir. Save to 'output_dir/name/data_name_i.h5ad'.

Args:

name (str): Subfolder name.
data_name (str): If None, set as data_dir.

`method` `to_CPU`

to_CPU()

Move all chunks to CPU.

`method` `to_GPU`

to_GPU()

Move all chunks to GPU.

`method` `umap`

umap(random_state=42)

Embed the neighborhood graph using rapids-singlecell.

Args:

random_state (int=42): Random seed.

This file was automatically generated via lazydocs.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

module pp

Global Variables

class ScaleSC

method __init__

property adata

property adata_X

method calculate_qc_metrics

method clear

method filter_cells

method filter_genes

method filter_genes_and_cells

method harmony

method highly_variable_genes

method leiden

method neighbors

method normalize_log1p

method normalize_log1p_pca

method pca

method save

method savex

method to_CPU

method to_GPU

method umap

`module` `pp`

`class` `ScaleSC`

`method` `init`

`property` adata

`property` adata_X

`method` `calculate_qc_metrics`

`method` `clear`

`method` `filter_cells`

`method` `filter_genes`

`method` `filter_genes_and_cells`

`method` `harmony`

`method` `highly_variable_genes`

`method` `leiden`

`method` `neighbors`

`method` `normalize_log1p`

`method` `normalize_log1p_pca`

`method` `pca`

`method` `save`

`method` `savex`

`method` `to_CPU`

`method` `to_GPU`

`method` `umap`