Store deep mutational scanning data in a standardised format, alongside
metadata describing the the data. This function creates a deep_mutational_scan
S3 object. Several of these can be joined using bind_scans
to create a
combined dataset.
deep_mutational_scan( df, name, scheme = NULL, trans = NULL, na_value = "impute", annotate = FALSE, study = NA, gene = NA, source = NA, description = NA, ... )
df | A data frame to parse. |
---|---|
name | Name of the deep mutational scan. |
scheme | Original data scheme (see |
trans | Function to transform scores onto the standard scale. Accepts a string corresponding to known transforms
or a custom function ( |
na_value | How to set missense NA scores (see |
annotate | Annotate the dataset with mutational landscape data (PCA, UMAP and amino acid subtypes). |
study | Study in which the scan was performed. |
gene | Gene scanned. |
source | Source of study. This is for reference only and is not used internally. |
description | Description of study, containing any miscellaneous details. This is for reference only and is not used internally. |
... | Additional arguments passed to |
A deep_mutational_scan S3 object, containing the following fields:
data: wide format tibble
containing ER scores and other positional data
meta: Tibble containing meta data about each study in the dataset
imputed: logical indicating the dataset has been completed via imputation(see impute
for details)
annotated: logical indicating if the data has been annotated with PCs, UMAP coordinates and clusters
multi_study: logical indicating the dataset contains data from multiple studies
The data
tibble contains the following fields (those marked * are not always present):
name: Name of the scan this position is from.
position: Position in the protein.
wt: Wild type amino acid.
A-Y: Fitness for each substitution.
impute_A-impute_Y: Whether the fitness score is imputed (see impute
).
PC1-PC20: Deep landscape principal component coordinates.
umap1-umap2: Deep landscape UMAP coordinates.
cluster: Assigned amino acid subtype.
base_cluster: Nearest functional subtypes centroid in PC cosine space, which is the assigned subtype before corrections are made for permissive, outlier and ambiguous subtypes.
permissive, ambiguous, high_distance: Whether the position fulfilled the criteria for the special subtypes.
dist1-dist8: PC cosine distance to each cluster centroid for subtypes of that amino acid.
cluster_notes: Notes about cluster assignment.
The class is used like a list apart from [ accesses the main data tibble (see details). The class is also associated with other common generics (see basic functions, summary, data frames, plotting. Multiple deep_mutational scan objects can be combined using bind_scans).
# Create a scan object path <- system.file("extdata", "urn_mavedb_00000011_a_1_scores.csv", package = "deepscanscape") csv <- read.csv(path, skip = 4) dms <- deep_mutational_scan(csv, name = "Hietpas Hsp90", scheme = "mave", trans = NULL, na_value = "impute", annotate = FALSE, gene = "Hsp90", study = "Hietpas et al. (2011)", source = "", description = "Scan of 9 hsp90 positions")#> Warning: Duplicate scores present - averaging scores for each variant using 'mean'