Processes deep mutational scanning data from recognised formats and convert it into a standardised data frame, with one row per variant and position, wt, mut and score columns. Duplicate scores can also be averaged to account for multiple replicates and single variant scores averaged from multiply mutated sequences for some input formats.
parse_deep_scan( x, scheme = c("mavedb", "long", "wide", "sequence"), position_offset = 0, duplicates = c("warn", "mean", "median", "error"), ... )
x | A tibble or an object that can be converted to one |
---|---|
scheme | Original data scheme (see description). Accepts any unambiguous substring. |
position_offset | Offset all position |
duplicates | How to handle duplicate scores for a single mutation |
... | Arguments passed to the chosen parsing function, including specifying parsing for multiply mutated sequences (see individual methods for details). |
A long format tibble
with columns for 'position', 'wt', 'mut' and 'score'.
The following input formats are supported:
mavedb
: Data downloaded from mavedb.org. This format additionally supports
averaging over multiply mutated sequences.
long: Data in long format, which already has the required columns (position, wt, mut and score). This allows duplicates to be averaged conveniently if data is already in the standard format.
wide
: Process wide format data, with columns for position, wt and scores for each
amino acid
sequence
: Each score is associated with a sequence, from which variants are
extracted. This method supports averaging over multiply mutated sequences. It is the most frequent use for the
position_offset parameter.