cellNexus is a query interface for programmatic exploration and retrieval of harmonised, curated, and reannotated CELLxGENE human-cell-atlas data.

This standalone documentation website provides:
- Detailed data-processing information (quality control, harmonisation, and expression representations).
- Complete explanations of metadata columns used in filtering and interpretation.
- Guided examples for metadata-first exploration and gene-expression analysis.
Start Here
- Home: package purpose, architecture, and processing pipeline overview.
- Metadata Explore: field-by-field metadata dictionary, examples, and filtering strategy.
- Gene Expression Explore: practical workflows for querying single-cell, pseudobulk, and metacell expression.
Data Processing Overview
The harmonisation pipeline standardises data across datasets so queries are consistent across studies:
- Metadata are retrieved from cloud-hosted harmonised tables.
- Standardised quality control removes empty droplets, dead/damaged cells, and likely doublets.
- Cell-level data are served through common assay layers (
counts,cpm, pseudobulk, metacell). - Outputs are returned in analysis-ready R formats such as
SingleCellExperimentandSeurat.
For implementation details and code examples, see the vignettes listed in the top navigation.
Client Usage Examples
R client (cellNexus)
library(cellNexus)
library(dplyr)
library(stringr)
metadata <- get_metadata(cloud_metadata = SAMPLE_DATABASE_URL)
query <- metadata |>
filter(
empty_droplet == FALSE,
alive == TRUE,
scDblFinder.class != "doublet",
self_reported_ethnicity == "African",
str_like(assay, "%10x%"),
tissue == "lung parenchyma",
str_like(cell_type, "%CD4%")
)
sce <- get_single_cell_experiment(query, assays = "cpm")
pb <- get_pseudobulk(query)Python client (cellNexusPy)
Python support is available in the companion repository: MangiolaLaboratory/cellNexusPy.
from cellnexuspy import get_metadata, get_anndata
sample_dataset = "https://object-store.rc.nectar.org.au/v1/AUTH_06d6e008e3e642da99d806ba3ea629c5/cellNexus-metadata/sample_metadata.1.3.0.parquet"
conn, table = get_metadata(parquet_url=sample_dataset)
table = table.filter("""
empty_droplet = 'false'
AND alive = 'true'
AND "scDblFinder.class" != 'doublet'
AND feature_count >= 5000
""")
query = table.filter("""
self_reported_ethnicity = 'African'
AND assay LIKE '%10%'
AND tissue = 'lung parenchyma'
AND cell_type LIKE '%CD4%'
""")
adata = get_anndata(query, assay="cpm")
pb = get_anndata(query, aggregation="pseudobulk")
conn.close()