Impute missing deep mutational scan data — impute • DeepScanScape

Impute NA values in a deep_mutational_scan objects ER scores. Values can be imputed as the median scores from the combined landscape dataset or using custom values.

impute(x, na_value = "impute")

Arguments

x	`deep_mutational_scan` to impute data from
na_value	Value to set missense NA values to. Can be "impute", "average", a single value or a matrix of substitution scores (see details).

Value

An imputed deep_mutational_scan

Details

If na_value == "impute" missense NA scores are imputed to be the median value of that substitution (e.g. A -> C) from the deep_landscape dataset. If na_value == "average" missense NA scores are set to the average missense score for that position. If na_value is a matrix it should have rows and column names corresponding to single letter amino acid codes and have cell i,j correspond to the imputed score for substitutions from i to j. Any other value of na_value is interpreted as the score to impute all NA values to.

An impute mask is also generated and added to the data tibble of the deep_mutational_scan. This consists of one column for each amino acid (impute_X) which contains '0' if the corresponding score in that row is not imputed, '1' for synonymous substitution imputed as 0 and '2' for non-synonymous substitutions that have undergone imputation.

Examples

# Load an unimputed DMS object
path <- system.file("extdata", "urn_mavedb_00000011_a_1_scores.csv",
                    package = "deepscanscape")
csv <- read.csv(path, skip = 4)
dms <- deep_mutational_scan(csv, name = "Hietpas Hsp90", scheme = "mave", trans = NULL,
                            na_value = NULL, annotate = FALSE)
#> Warning: Duplicate scores present - averaging scores for each variant using 'mean'
#> Warning: No imputation applied but NA values present. Data may not be suitable for downstream analysis until NA values are removed.

# Set NA to a constant
impute(dms, na_value = 1)
#> # Deep mutational scanning data
#> # Name: Hietpas Hsp90
#> NA
#> NA
#> # 9 positions
#> # Positional data:
#>   position wt           A        C      D       E        F      G        H
#>      <dbl> <chr>    <dbl>    <dbl>  <dbl>   <dbl>    <dbl>  <dbl>    <dbl>
#> 1        1 Q     -0.00345 -0.0175  -0.267 -0.0209 -0.00234 -0.534  0.00303
#> 2        2 F     -0.247   -0.184   -0.429 -0.769   0       -0.161 -0.0464 
#> 3        3 G     -0.151   -0.137   -0.715 -0.813  -0.0383   0     -0.632  
#> 4        4 W     -0.954   -0.857   -1.32  -1.14   -0.0259  -1.02  -0.941  
#> 5        5 S     -0.983   -0.982   -0.941 -0.845  -0.881   -0.252 -0.888  
#> 6        6 A      0       -0.638   -0.907 -1.03   -0.944   -0.191 -0.902  
#> 7        7 N     -0.181   -0.245   -0.922 -0.855  -0.0423  -0.256 -0.0403 
#> 8        8 M     -0.173   -0.301   -0.840 -0.225  -0.00593 -0.928 -0.641  
#> 9        9 E     -0.0412  -0.00649 -0.139  0      -0.0605  -0.750 -0.0521 
#> # … with 34 more variables: I <dbl>, K <dbl>, L <dbl>, M <dbl>, N <dbl>,
#> #   P <dbl>, Q <dbl>, R <dbl>, S <dbl>, T <dbl>, V <dbl>, W <dbl>, Y <dbl>,
#> #   name <chr>, impute_A <dbl>, impute_C <dbl>, impute_D <dbl>, impute_E <dbl>,
#> #   impute_F <dbl>, impute_G <dbl>, impute_H <dbl>, impute_I <dbl>,
#> #   impute_K <dbl>, impute_L <dbl>, impute_M <dbl>, impute_N <dbl>,
#> #   impute_P <dbl>, impute_Q <dbl>, impute_R <dbl>, impute_S <dbl>,
#> #   impute_T <dbl>, impute_V <dbl>, impute_W <dbl>, impute_Y <dbl>

# Use the built in imputed values
impute(dms, na_value = "impute")
#> # Deep mutational scanning data
#> # Name: Hietpas Hsp90
#> NA
#> NA
#> # 9 positions
#> # Positional data:
#>   position wt           A        C      D       E        F      G        H
#>      <dbl> <chr>    <dbl>    <dbl>  <dbl>   <dbl>    <dbl>  <dbl>    <dbl>
#> 1        1 Q     -0.00345 -0.0175  -0.267 -0.0209 -0.00234 -0.534  0.00303
#> 2        2 F     -0.247   -0.184   -0.429 -0.769   0       -0.161 -0.0464 
#> 3        3 G     -0.151   -0.137   -0.715 -0.813  -0.0383   0     -0.632  
#> 4        4 W     -0.954   -0.857   -1.32  -1.14   -0.0259  -1.02  -0.941  
#> 5        5 S     -0.983   -0.982   -0.941 -0.845  -0.881   -0.252 -0.888  
#> 6        6 A      0       -0.638   -0.907 -1.03   -0.944   -0.191 -0.902  
#> 7        7 N     -0.181   -0.245   -0.922 -0.855  -0.0423  -0.256 -0.0403 
#> 8        8 M     -0.173   -0.301   -0.840 -0.225  -0.00593 -0.928 -0.641  
#> 9        9 E     -0.0412  -0.00649 -0.139  0      -0.0605  -0.750 -0.0521 
#> # … with 34 more variables: I <dbl>, K <dbl>, L <dbl>, M <dbl>, N <dbl>,
#> #   P <dbl>, Q <dbl>, R <dbl>, S <dbl>, T <dbl>, V <dbl>, W <dbl>, Y <dbl>,
#> #   name <chr>, impute_A <dbl>, impute_C <dbl>, impute_D <dbl>, impute_E <dbl>,
#> #   impute_F <dbl>, impute_G <dbl>, impute_H <dbl>, impute_I <dbl>,
#> #   impute_K <dbl>, impute_L <dbl>, impute_M <dbl>, impute_N <dbl>,
#> #   impute_P <dbl>, impute_Q <dbl>, impute_R <dbl>, impute_S <dbl>,
#> #   impute_T <dbl>, impute_V <dbl>, impute_W <dbl>, impute_Y <dbl>