Downloads and joins the Census metadata with cellNexus metadata. This function creates indexed tables for efficient joining and returns a data frame.
Usage
join_census_table(
tbl,
cloud_metadata = get_metadata_url("census_cell_metadata.2.3.0.parquet"),
cache_directory = get_default_cache_dir(),
join_keys = c("sample_id", "dataset_id", "observation_joinid")
)Arguments
- tbl
A
tbl_sqlobject (from get_metadata) or a database connection.- cloud_metadata
HTTP URL/URLs pointing to the census parquet database name and location.
- cache_directory
A character string specifying the local cache directory where remote parquet files will be stored. Defaults to
get_default_cache_dir().- join_keys
A character vector of column names used for the join. Defaults to
c("sample_id", "dataset_id", "observation_joinid").
Examples
library(dplyr)
get_metadata(cloud_metadata = SAMPLE_DATABASE_URL) |> head(2) |>
# You do not need to specify anything in cloud_metadata
join_census_table(
cloud_metadata = SAMPLE_DATABASE_URL,
cache_directory = tempdir()
)
#> ℹ Downloading 1 file, totalling 0 GB
#> ℹ Downloading https://object-store.rc.nectar.org.au/v1/AUTH_06d6e008e3e642da99d806ba3ea629c5/cellNexus-metadata/census_sample_metadata.2.3.0.parquet to /tmp/Rtmpzb8Xyh/census_sample_metadata.2.3.0.parquet
#> # Source: SQL [?? x 149]
#> # Database: DuckDB 1.5.2 [unknown@Linux 6.17.0-1018-azure:R 4.6.0/:memory:]
#> cell_id.x observation_joinid dataset_id sample_id sample_.x experiment___.x
#> <dbl> <chr> <chr> <chr> <chr> <chr>
#> 1 17 QRMCN*8*|# 842c6f5d-4a9… 1119f482… 1119f482… ""
#> 2 16 j}0<Y>a#X~ 842c6f5d-4a9… 1119f482… 1119f482… ""
#> 3 17 QRMCN*8*|# 842c6f5d-4a9… 1119f482… 1119f482… ""
#> 4 16 j}0<Y>a#X~ 842c6f5d-4a9… 1119f482… 1119f482… ""
#> # ℹ 143 more variables: run_from_cell_id.x <chr>, sample_heuristic.x <chr>,
#> # age_days.x <int>, tissue_groups.x <chr>,
#> # nFeature_expressed_in_sample.x <int>, nCount_RNA.x <dbl>,
#> # empty_droplet.x <lgl>, cell_type_unified_ensemble.x <chr>,
#> # is_immune.x <lgl>, subsets_Mito_percent.x <int>,
#> # subsets_Ribo_percent.x <int>, high_mitochondrion.x <lgl>,
#> # high_ribosome.x <lgl>, scDblFinder.class.x <chr>, sample_chunk.x <int>, …
