Download Lecture 4: Random Variables and Distributions and more Lecture notes Probability and Statistics in PDF only on Docsity!
Lecture 4: Random
Variables and Distributions
Goals
- Working with distributions in R
- Overview of discrete and continuous
distributions important in genetics/genomics
Two Types of Random Variables
- A discrete random variable has a
countable number of possible values
- A continuous random variable takes all
values in an interval of numbers
Probability Distributions of RVs
Discrete
Let X be a discrete rv. Then the probability mass function (pmf), f(x), of X is: f ( x ) = P(X = x), x ∈ Ω 0, x^ ∉^ Ω
Continuous
P ( a " X " b ) = f ( x ) dx a b
Let X be a continuous rv. Then the probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with a ≤ b: a b A a
Using CDFs to Compute Probabilities
Continuous rv: ! F ( x ) = P ( X " x ) = f ( y ) dy #$ x % pdf (^) cdf P ( a " X " b ) = F ( b ) # F ( a )
Expectation of Random Variables
Continuous
X = E [ X ] = x " f ( x ) dx #$ $
The expected or mean value of a continuous rv X with pdf f(x) is:
Discrete
Let X be a discrete rv that takes on values in the set D and has a pmf f(x). Then the expected or mean value of X is:
X
= E [ X ] = x " f ( x )
x # D
Example of Expectation and Variance
- Let L 1 , L 2 , …, L n be a sequence of n nucleotides and define the rv X i : 1, if L i = A 0, otherwise X i
- pmf is then: P(X i = 1 ) = P(L i = A) = p A P(X i = 0 ) = P(L i = C or G or T) = 1 - p A
- E[X] = 1 x p A
- Var[X] = E[X - μ] 2 = E[X 2 ] - μ 2 = [ 1 2 x p A
- 0 2 x (1 - p A )] - p A 2 = p A (1 - p A )
The Distributions We’ll Study
- Binomial Distribution
- Hypergeometric Distribution
- Poisson Distribution
- Normal Distribution
Binomial Distribution
! P { X = x } = (^) ( ) p x ( 1 " p ) n n " x x
pmf:
E(x) = np
cdf:
P { X " x } = ( ) p
y
( 1 # p )
n # y y = 0 x $ n y
Var(x) = np( 1 -p)
Binomial Distribution: Example 1
- A couple, who are both carriers for a recessive
disease, wish to have 5 children. They want to know
the probability that they will have four healthy kids
! P { X = 4 } = (^) ( )0. 4 " 0. (^5) 1 4
0 1 2 3 4 5 p(x)
Hypergeometric Distribution
- Population to be sampled consists of N
finite individuals, objects, or elements
- Each individual can be characterized as a
success or failure, m successes in the
population
- A sample of size k is drawn and the rv of
interest is X = number of successes
Hypergeometric Distribution
- Similar in spirit to Binomial distribution, but from a finite population without replacement 20 white balls out of 100 balls If we randomly sample 10 balls, what is the probability that 7 or more are white?
Hypergeometric Distribution
- Extensively used in genomics to test for “enrichment”: " = Number of annotated genes Number of genes of interest Number of genes with annotation Number of genes of interest with annotation
Poisson Distribution
- Useful in studying rare events
- Poisson distribution also used in situations
where “events” happen at certain points
in time
- Poisson distribution approximates the
binomial distribution when n is large and p
is small