API Reference¶

Knowledge Base¶

Tools: Labeling¶

cytopus.tl.label.overlap_coefficient(set_a, set_b)[source]¶: calculate the overlap coefficient between two sets

cytopus.tl.label.label_marker_genes(marker_genes, gs_label_dict, threshold=0.4)[source]¶

label an array of marker genes using a KnowledgeBase or a dictionary derived from the KnowledgeBase returns a dataframe of overlap coefficients for each gene set annotation and marker gene

marker_genes: numpy.array or list of lists, factors x marker genes gs_label_dict: cytopus.KnowledgeBase or dict, with gene set names (str) as keys and gene sets (list) as values threshold: float, if overlap coefficient > than threshold the factor will be labeled with the gene set name with maximum overlap coefficient

returns: pandas.DataFrame, with overlap coefficients of factors (rows) and gene sets (columns), indices are relabeled to the gene set with the maximum overlap coefficient

cytopus.tl.label.get_celltype(adata, celltype_key, factor_list=None, Spectra_cell_scores='SPECTRA_cell_scores')[source]¶: For a list of factors check in which cell types they are expressed adata: anndata.AnnData, containing cell type labels in adata.obs[celltype_key] celltype_key: str, key for adata.obs containing the cell type labels factor_list: list, list of keys for factor loadings in .obs, if none use factor loadings in adata.obsm[‘SPECTRA_factors’] return: dictionary mapping factor names and celltypes Spectra_cell_scores: str, key for Spectra cell scores in adata.obsm

cytopus.tl.label.get_gmt(gs_dict, save=False, path=None)[source]¶: transform a dictionary into a .gmt file gs_dict: dict, gene set dictionary with format {‘gene set name’:[‘Gene_a’,’Gene_b’,’Gene_c’,…]} save: bool, if True saves .gmt file to path path: str, path to save .gmt file

cytopus.tl.label.flatten_hierarchical_dict(d, parent_key=None)[source]¶

cytopus.tl.label.hierarchy_to_csv(hierarchy, filename='hierarchy.csv', header_name=['Parent', 'Child'])[source]¶: get hierarchy from knowledge base and write to .csv hierarchy : dict, nested dict containing cell type hierarchy e.g. G.get_celltype_hierarchy() filename : str, output file name to write csv to header_name : ls, header name of the csv

cytopus.tl.label.geneset_to_csv(gs_dict, filename='geneset.csv', header_name=['gene_set_name', 'gene_name'])[source]¶: get gene sets from knowledge base and write to .csv gs_dict : dict, gene set dictionary e.g. G.processes header_name : ls, name of header in .csv file filename : str, output file name to write csv to

cytopus.tl.label.metadata_to_csv(graph, file_name, specific_class=False, class_value=None)[source]¶: get metadata and write to csv graph : networkx.DiGraph, graph containing nodes with attributes file_name : str, path to write csv to specific_class : str, restrict to nodes with specific ‘class’ attribute class_value : str, class attribute to restrict to

Tools: Create¶

cytopus.tl.create.construct_kb(celltype_edges, geneset_gene_edges, geneset_celltype_edges, annotation_dict, metadata_dict=None, save=False, save_path=None)[source]¶: construct a cytopus.kb.KnowledgeBase object celltype_edges: list, list of tuples storing the edges of the cell type hierarchy as (‘child’, ‘parent’) geneset_gene_edges: list, list of tuples storing the edges connecting every gene_set with every gene as (‘gene_set’,’gene’) geneset_celltype_edges: list, list of tuples storing the edges connecting every gene sets with its cell type as (‘gene_set’,’celltype’) annotation_dict: dict, containing the gene set names as keys and their annotation names (cellular_process or cellular_identity) as values metadata_dict: dict, nested dict containing the gene set names as keys and a dict storing their attributes_categories as keys and corresponding attributes as values save: bool, if True saves the data to the path provided in save_path save_path: str, path to save the data to (.txt file)

Tools: Hierarchy¶

cytopus.tl.hierarchy.build_nested_dict(graph, node)[source]¶: build nested dictionary from reverse view of cytopus cell type hierarchy graph: networkx.DiGraph.view, reverse view of Cytopus cell type hierarchy root: str, name of root node in the reversed view

cytopus.tl.hierarchy.get_hierarchy_dict(G)[source]¶: reverse Cytopus cell type hierarchy and build nested hierarchy from it G: Cytopus.KnowledgeBase, containing cell type hierarchy

cytopus.tl.hierarchy.create_hierarchical_graph(data, type_label)[source]¶

cytopus.tl.hierarchy.get_all_keys(d)[source]¶

cytopus.tl.hierarchy.get_nodes_of_type(graph, node_type)[source]¶

cytopus.tl.hierarchy.get_indices(df, value)[source]¶

cytopus.tl.hierarchy.get_node_labels(graph, node_type)[source]¶

class cytopus.tl.hierarchy.Hierarchy(hierarchy_dict)[source]¶

Bases: object

nx = <module 'networkx' from '/home/docs/checkouts/readthedocs.org/user_builds/cytopus/envs/latest/lib/python3.11/site-packages/networkx/__init__.py'>¶

__init__(hierarchy_dict)[source]¶: load hierarchy class hierarchy_dict: dict, nested dict containing the cell type hierarchy

identities()[source]¶: print cell types contained in hierarchy

plot_celltypes(node_color='#8decf5', node_size=1000, edge_width=1, arrow_size=20, edge_color='k', label_size=10, figsize=[30, 30])[source]¶: plot all cell types contained in hierarchy object

add_cells(adata, obs_columns=None)[source]¶: Add cells to their most granular annotation in the hierarchy object. adata: anndata.AnnData, containing the cell type annotations under adata.obs. obs_columns: list, list of columns in adata.obs where the cell type annotations are stored (recommended).

query_ancestors(query_node, adata=None, obs_key='hierarchical_query')[source]¶: retrieves all cell barcodes belonging to the cell type and all of its subsets query_node: str, cell type name fir which to retrieve barcodes node_type: str, node type of cell type node (here: ‘cell_type’) adata: anndata.AnnData, adata to store the cell type annotations under adata.obs[obs_key] obs_key: str, column label to store cell tyoe annotations under adata.obs[obs_key] returns: dict, containing the barcodes belonging to each annotation in self.annotations, if adata is provided they will also be stored in adata.obs[obs_key]

trim_annotations(adata, coarse_labels, obs_key='trimmed_annotation')[source]¶

Trim the hierarchy to revert all labels to their coarse parent labels from a defined list of labels.

coarse_labels: list, list of labels to which the hierarchy should be trimmed. adata: anndata.AnnData, adata to store the trimmed annotations under adata.obs[obs_key] obs_key: str, column label to store trimmed annotations under adata.obs[obs_key] returns: dict, containing the barcodes belonging to each coarse label.

get_cells_for_cell_type(cell_type)[source]¶: Retrieve all cells assigned to a specific cell type in the hierarchy. cell_type: str, name of the cell type node to query. returns: ls, of cell barcodes assigned to the given cell type.