API Reference¶
Knowledge Base¶
Tools: Labeling¶
- cytopus.tl.label.overlap_coefficient(set_a, set_b)[source]¶
calculate the overlap coefficient between two sets
- cytopus.tl.label.label_marker_genes(marker_genes, gs_label_dict, threshold=0.4)[source]¶
label an array of marker genes using a KnowledgeBase or a dictionary derived from the KnowledgeBase returns a dataframe of overlap coefficients for each gene set annotation and marker gene
marker_genes: numpy.array or list of lists, factors x marker genes gs_label_dict: cytopus.KnowledgeBase or dict, with gene set names (str) as keys and gene sets (list) as values threshold: float, if overlap coefficient > than threshold the factor will be labeled with the gene set name with maximum overlap coefficient
returns: pandas.DataFrame, with overlap coefficients of factors (rows) and gene sets (columns), indices are relabeled to the gene set with the maximum overlap coefficient
- cytopus.tl.label.get_celltype(adata, celltype_key, factor_list=None, Spectra_cell_scores='SPECTRA_cell_scores')[source]¶
For a list of factors check in which cell types they are expressed adata: anndata.AnnData, containing cell type labels in adata.obs[celltype_key] celltype_key: str, key for adata.obs containing the cell type labels factor_list: list, list of keys for factor loadings in .obs, if none use factor loadings in adata.obsm[‘SPECTRA_factors’] return: dictionary mapping factor names and celltypes Spectra_cell_scores: str, key for Spectra cell scores in adata.obsm
- cytopus.tl.label.get_gmt(gs_dict, save=False, path=None)[source]¶
transform a dictionary into a .gmt file gs_dict: dict, gene set dictionary with format {‘gene set name’:[‘Gene_a’,’Gene_b’,’Gene_c’,…]} save: bool, if True saves .gmt file to path path: str, path to save .gmt file
- cytopus.tl.label.hierarchy_to_csv(hierarchy, filename='hierarchy.csv', header_name=['Parent', 'Child'])[source]¶
get hierarchy from knowledge base and write to .csv hierarchy : dict, nested dict containing cell type hierarchy e.g. G.get_celltype_hierarchy() filename : str, output file name to write csv to header_name : ls, header name of the csv
- cytopus.tl.label.geneset_to_csv(gs_dict, filename='geneset.csv', header_name=['gene_set_name', 'gene_name'])[source]¶
get gene sets from knowledge base and write to .csv gs_dict : dict, gene set dictionary e.g. G.processes header_name : ls, name of header in .csv file filename : str, output file name to write csv to
- cytopus.tl.label.metadata_to_csv(graph, file_name, specific_class=False, class_value=None)[source]¶
get metadata and write to csv graph : networkx.DiGraph, graph containing nodes with attributes file_name : str, path to write csv to specific_class : str, restrict to nodes with specific ‘class’ attribute class_value : str, class attribute to restrict to
Tools: Create¶
- cytopus.tl.create.construct_kb(celltype_edges, geneset_gene_edges, geneset_celltype_edges, annotation_dict, metadata_dict=None, save=False, save_path=None)[source]¶
construct a cytopus.kb.KnowledgeBase object celltype_edges: list, list of tuples storing the edges of the cell type hierarchy as (‘child’, ‘parent’) geneset_gene_edges: list, list of tuples storing the edges connecting every gene_set with every gene as (‘gene_set’,’gene’) geneset_celltype_edges: list, list of tuples storing the edges connecting every gene sets with its cell type as (‘gene_set’,’celltype’) annotation_dict: dict, containing the gene set names as keys and their annotation names (cellular_process or cellular_identity) as values metadata_dict: dict, nested dict containing the gene set names as keys and a dict storing their attributes_categories as keys and corresponding attributes as values save: bool, if True saves the data to the path provided in save_path save_path: str, path to save the data to (.txt file)
Tools: Hierarchy¶
- cytopus.tl.hierarchy.build_nested_dict(graph, node)[source]¶
build nested dictionary from reverse view of cytopus cell type hierarchy graph: networkx.DiGraph.view, reverse view of Cytopus cell type hierarchy root: str, name of root node in the reversed view
- cytopus.tl.hierarchy.get_hierarchy_dict(G)[source]¶
reverse Cytopus cell type hierarchy and build nested hierarchy from it G: Cytopus.KnowledgeBase, containing cell type hierarchy
- class cytopus.tl.hierarchy.Hierarchy(hierarchy_dict)[source]¶
Bases:
object- nx = <module 'networkx' from '/home/docs/checkouts/readthedocs.org/user_builds/cytopus/envs/latest/lib/python3.11/site-packages/networkx/__init__.py'>¶
- __init__(hierarchy_dict)[source]¶
load hierarchy class hierarchy_dict: dict, nested dict containing the cell type hierarchy
- plot_celltypes(node_color='#8decf5', node_size=1000, edge_width=1, arrow_size=20, edge_color='k', label_size=10, figsize=[30, 30])[source]¶
plot all cell types contained in hierarchy object
- add_cells(adata, obs_columns=None)[source]¶
Add cells to their most granular annotation in the hierarchy object. adata: anndata.AnnData, containing the cell type annotations under adata.obs. obs_columns: list, list of columns in adata.obs where the cell type annotations are stored (recommended).
- query_ancestors(query_node, adata=None, obs_key='hierarchical_query')[source]¶
retrieves all cell barcodes belonging to the cell type and all of its subsets query_node: str, cell type name fir which to retrieve barcodes node_type: str, node type of cell type node (here: ‘cell_type’) adata: anndata.AnnData, adata to store the cell type annotations under adata.obs[obs_key] obs_key: str, column label to store cell tyoe annotations under adata.obs[obs_key] returns: dict, containing the barcodes belonging to each annotation in self.annotations, if adata is provided they will also be stored in adata.obs[obs_key]
- trim_annotations(adata, coarse_labels, obs_key='trimmed_annotation')[source]¶
Trim the hierarchy to revert all labels to their coarse parent labels from a defined list of labels.
coarse_labels: list, list of labels to which the hierarchy should be trimmed. adata: anndata.AnnData, adata to store the trimmed annotations under adata.obs[obs_key] obs_key: str, column label to store trimmed annotations under adata.obs[obs_key] returns: dict, containing the barcodes belonging to each coarse label.