Deep mutational scan data — deep_mutational

Store deep mutational scanning data in a standardised format, alongside metadata describing the the data. This function creates a deep_mutational_scan S3 object. Several of these can be joined using bind_scans to create a combined dataset.

deep_mutational_scan(
  df,
  name,
  scheme = NULL,
  trans = NULL,
  na_value = "impute",
  annotate = FALSE,
  study = NA,
  gene = NA,
  source = NA,
  description = NA,
  ...
)

Arguments

df	A data frame to parse.
name	Name of the deep mutational scan.
scheme	Original data scheme (see `parse_deep_scan`).
trans	Function to transform scores onto the standard scale. Accepts a string corresponding to known transforms or a custom function (`transform_er`).
na_value	How to set missense NA scores (see `impute`).
annotate	Annotate the dataset with mutational landscape data (PCA, UMAP and amino acid subtypes).
study	Study in which the scan was performed.
gene	Gene scanned.
source	Source of study. This is for reference only and is not used internally.
description	Description of study, containing any miscellaneous details. This is for reference only and is not used internally.
...	Additional arguments passed to `parse_deep_scan`.

Value

A deep_mutational_scan S3 object, containing the following fields:

data: wide format tibble containing ER scores and other positional data
meta: Tibble containing meta data about each study in the dataset
imputed: logical indicating the dataset has been completed via imputation(see impute for details)
annotated: logical indicating if the data has been annotated with PCs, UMAP coordinates and clusters
multi_study: logical indicating the dataset contains data from multiple studies

The data tibble contains the following fields (those marked * are not always present):

name: Name of the scan this position is from.
position: Position in the protein.
wt: Wild type amino acid.
A-Y: Fitness for each substitution.
impute_A-impute_Y: Whether the fitness score is imputed (see impute).
PC1-PC20: Deep landscape principal component coordinates.
umap1-umap2: Deep landscape UMAP coordinates.
cluster: Assigned amino acid subtype.
base_cluster: Nearest functional subtypes centroid in PC cosine space, which is the assigned subtype before corrections are made for permissive, outlier and ambiguous subtypes.
permissive, ambiguous, high_distance: Whether the position fulfilled the criteria for the special subtypes.
dist1-dist8: PC cosine distance to each cluster centroid for subtypes of that amino acid.
cluster_notes: Notes about cluster assignment.

The class is used like a list apart from [ accesses the main data tibble (see details). The class is also associated with other common generics (see basic functions, summary, data frames, plotting. Multiple deep_mutational scan objects can be combined using bind_scans).

Examples


# Create a scan object
path <- system.file("extdata", "urn_mavedb_00000011_a_1_scores.csv",
                    package = "deepscanscape")
csv <- read.csv(path, skip = 4)
dms <- deep_mutational_scan(csv, name = "Hietpas Hsp90", scheme = "mave", trans = NULL,
                            na_value = "impute", annotate = FALSE, gene = "Hsp90",
                            study = "Hietpas et al. (2011)", source = "",
                            description = "Scan of 9 hsp90 positions")
#> Warning: Duplicate scores present - averaging scores for each variant using 'mean'