Store deep mutational scanning data in a standardised format, alongside metadata describing the the data. This function creates a deep_mutational_scan S3 object. Several of these can be joined using bind_scans to create a combined dataset.

deep_mutational_scan(
  df,
  name,
  scheme = NULL,
  trans = NULL,
  na_value = "impute",
  annotate = FALSE,
  study = NA,
  gene = NA,
  source = NA,
  description = NA,
  ...
)

Arguments

df

A data frame to parse.

name

Name of the deep mutational scan.

scheme

Original data scheme (see parse_deep_scan).

trans

Function to transform scores onto the standard scale. Accepts a string corresponding to known transforms or a custom function (transform_er).

na_value

How to set missense NA scores (see impute).

annotate

Annotate the dataset with mutational landscape data (PCA, UMAP and amino acid subtypes).

study

Study in which the scan was performed.

gene

Gene scanned.

source

Source of study. This is for reference only and is not used internally.

description

Description of study, containing any miscellaneous details. This is for reference only and is not used internally.

...

Additional arguments passed to parse_deep_scan.

Value

A deep_mutational_scan S3 object, containing the following fields:

  • data: wide format tibble containing ER scores and other positional data

  • meta: Tibble containing meta data about each study in the dataset

  • imputed: logical indicating the dataset has been completed via imputation(see impute for details)

  • annotated: logical indicating if the data has been annotated with PCs, UMAP coordinates and clusters

  • multi_study: logical indicating the dataset contains data from multiple studies

The data tibble contains the following fields (those marked * are not always present):

  • name: Name of the scan this position is from.

  • position: Position in the protein.

  • wt: Wild type amino acid.

  • A-Y: Fitness for each substitution.

  • impute_A-impute_Y: Whether the fitness score is imputed (see impute).

  • PC1-PC20: Deep landscape principal component coordinates.

  • umap1-umap2: Deep landscape UMAP coordinates.

  • cluster: Assigned amino acid subtype.

  • base_cluster: Nearest functional subtypes centroid in PC cosine space, which is the assigned subtype before corrections are made for permissive, outlier and ambiguous subtypes.

  • permissive, ambiguous, high_distance: Whether the position fulfilled the criteria for the special subtypes.

  • dist1-dist8: PC cosine distance to each cluster centroid for subtypes of that amino acid.

  • cluster_notes: Notes about cluster assignment.

The class is used like a list apart from [ accesses the main data tibble (see details). The class is also associated with other common generics (see basic functions, summary, data frames, plotting. Multiple deep_mutational scan objects can be combined using bind_scans).

Examples

# Create a scan object path <- system.file("extdata", "urn_mavedb_00000011_a_1_scores.csv", package = "deepscanscape") csv <- read.csv(path, skip = 4) dms <- deep_mutational_scan(csv, name = "Hietpas Hsp90", scheme = "mave", trans = NULL, na_value = "impute", annotate = FALSE, gene = "Hsp90", study = "Hietpas et al. (2011)", source = "", description = "Scan of 9 hsp90 positions")
#> Warning: Duplicate scores present - averaging scores for each variant using 'mean'