Processes deep mutational scanning data from recognised formats and convert it into a standardised data frame, with one row per variant and position, wt, mut and score columns. Duplicate scores can also be averaged to account for multiple replicates and single variant scores averaged from multiply mutated sequences for some input formats.

parse_deep_scan(
  x,
  scheme = c("mavedb", "long", "wide", "sequence"),
  position_offset = 0,
  duplicates = c("warn", "mean", "median", "error"),
  ...
)

Arguments

x

A tibble or an object that can be converted to one

scheme

Original data scheme (see description). Accepts any unambiguous substring.

position_offset

Offset all position

duplicates

How to handle duplicate scores for a single mutation

...

Arguments passed to the chosen parsing function, including specifying parsing for multiply mutated sequences (see individual methods for details).

Value

A long format tibble with columns for 'position', 'wt', 'mut' and 'score'.

Details

The following input formats are supported:

  • mavedb: Data downloaded from mavedb.org. This format additionally supports averaging over multiply mutated sequences.

  • long: Data in long format, which already has the required columns (position, wt, mut and score). This allows duplicates to be averaged conveniently if data is already in the standard format.

  • wide: Process wide format data, with columns for position, wt and scores for each amino acid

  • sequence: Each score is associated with a sequence, from which variants are extracted. This method supports averaging over multiply mutated sequences. It is the most frequent use for the position_offset parameter.