Parse data downloaded from MaveDB, including averaging across multiply mutated sequences. Usually called internally byparse_deep_scan but is exposed to help users get their input into the correct format.

parse_mavedb(x, score_col = "score", average_multi = FALSE, ...)

Arguments

x

A data frame with a column 'hgvs_pro' gicing the HGVS protein mutation string describing the variant(s) and a fitness score column ('score' by default).

score_col

String. Column containing fitness scores, to conveniently use an additional data column as the score where mutliple measurements are included in the data.

average_multi

Average scores for variants included in multiply mutated sequences, where they have not been measured individually. Care should be taken to check that the type of multiple mutation makes this appropriate.

...

Ignored.

Value

A long format tibble with columns specifying 'position', 'wt', 'mut' and 'score'.