Test for a Uniform Distribution across a set of best matches

testMotifPos(
  x,
  stringset,
  binwidth = 10,
  abs = FALSE,
  rc = TRUE,
  min_score = "80%",
  break_ties = "all",
  alt = c("greater", "less", "two.sided"),
  sort_by = c("p", "none"),
  mc.cores = 1,
  ...
)

Arguments

x

A Position Weight Matrix, universalmotif object or list thereof. Alternatively can be a single DataFrame or list of DataFrames as returned by getPwmMatches with best_only = TRUE

stringset

An XStringSet. Not required if matches are supplied as x

binwidth

Width of bins across the range to group data into

abs

Use absolute positions around zero to find symmetrical enrichment

rc

logical(1) Also find matches using the reverse complement of pwm

min_score

The minimum score to return a match

break_ties

Choose how to resolve matches with tied scores

alt

Alternative hypothesis for the binomial test

sort_by

Column to sort results by

mc.cores

Passed to mclapply

...

Passed to matchPWM

Value

A data.frame with columns start, end, centre, width, total_matches, matches_in_region, expected, enrichment, prop_total, p

and consensus_motif

The total matches represent the total number of matches within the set of sequences, whilst the number observed in the final region are also given, along with the proportion of the total this represents. Enrichment is simply the ratio of observed to expected based on the expectation of the null hypothesis

The consensus motif across all matches is returned as a Position Frequency Matrix (PFM) using consensusMatrix.

Details

This function tests for an even positional spread of motif matches across a set of sequences, using the assumption (i.e. H~0~) that if there is no positional bias, matches will be evenly distributed across all positions within a set of sequences. Conversely, if there is positional bias, typically but not necessarily near the centre of a range, this function intends to detect this signal, as a rejection of the null hypothesis.

Input can be provided as the output from getPwmMatches setting best_only = TRUE if these matches have already been identified. If choosing to provide this object to the argument matches, nothing is required for the arguments pwm, stringset, rc, min_score or break_ties Otherwise, a Position Weight Matrix (PWM) and an XStringSet are required, along with the relevant arguments, with best matches identified within the function.

The set of best matches are then grouped into bins along the range, with the central bin containing zero, and tallied. Setting abs to TRUE will set all positions from the centre as absolute values, returning counts purely as bins with distances from zero, marking this as an inclusive lower bound. Motif alignments are assigned into bins based on the central position of the match, as provided in the column from_centre when calling getPwmMatches.

The binom.test is performed on each bin using the alternative hypothesis, with the returned p-values across all bins combined using the Harmonic Mean p-value (HMP) (See p.hmp). All bins with raw p-values below the HMP are identified and the returned values for start, end, centre, width, matches in region, expected and enrichment are across this set of bins. The expectation is that where a positional bias is evident, this will be a narrow range containing a non-trivial proportion of the total matches.

It should also be noted that binom.test() can return p-values of zero, as beyond machine precision. In these instances, zero p-values are excluded from calculation of the HMP. This will give a very slight conservative bias, and assumes that for these extreme cases, neighbouring bins are highly likely to also return extremely low p-values and no significance will be lost.

Examples

## Load the example PWM
data("ex_pwm")
esr1 <- ex_pwm$ESR1

## Load the example sequences
data("ar_er_seq")

## Get the best match and use this data
matches <- getPwmMatches(esr1, ar_er_seq, best_only = TRUE)
## Test for enrichment in any position
testMotifPos(matches)
#>   start end centre width total_matches matches_in_region expected enrichment
#> 1  -165 185     10   350            45                38 27.90698   1.361667
#>   prop_total         p       fdr consensus_motif
#> 1  0.8444444 0.9741004 0.9741004    30, 2, 1....

## Provide a list of PWMs, testing for distance from zero
testMotifPos(ex_pwm, ar_er_seq, abs = TRUE, binwidth = 10)
#>       start end centre width total_matches matches_in_region     expected
#> ESR1     10  40     25    30            45                15   6.99481865
#> ANDR      0 190     95   190            53                28  13.80208333
#> FOXA1     0 190     95   190           179               110 100.97435897
#> ZN281    60 190    125   130            36                29  18.65284974
#> ZN143   120 130    125    10             1                 1   0.05263158
#>       enrichment prop_total         p       fdr consensus_motif
#> ESR1    2.144444  0.3333333 0.2493238 0.7445537    30, 2, 1....
#> ANDR    2.028679  0.5283019 0.2978215 0.7445537    4, 3, 2,....
#> FOXA1   1.089385  0.6145251 0.8064541 0.9520175    6, 0, 0,....
#> ZN281   1.554722  0.8055556 0.8349283 0.9520175    5, 0, 25....
#> ZN143  19.000000  1.0000000 0.9520175 0.9520175    1, 0, 0,....