Extract variants from sequences and ER scores. The most frequent amino acid at each position in the sequence is assumed to be the wild type if a wild type sequence is not provided. Usually called internally by parse_deep_scan but is exposed to help users get their input into the correct format.

parse_sequence(x, wt_seq = NULL, average_multi = FALSE, ...)

Arguments

x

A data frame with columns 'sequence' and 'score'.

wt_seq

Character vector. Wild type sequence to calculate variants against. If NULL it is assumed to be the most common amino acid in each position.

average_multi

Average scores for variants included in multiply mutated sequences, where they have not been measured individually. Care should be taken to check that the type of multiple mutation makes this appropriate.

...

Ignored.

Value

A long format data frame with columns for 'position', 'wt', 'mut' and 'score'.