January 14, 2024
I Made Krisna Gupta (Imed)
Politeknik APP Jakarta, Universitas Indonesia, Center for Indonesian Policy Studies
S3 di Australian National University, S2 di UI/VU Amsterdam
Fokus riset di perdagangan internasional dan kebijakan publik (particularly kebijakan industri)
more at krisna.or.id atau @imedkrisna
Dibuat berdasarkan materi “Introduction to Probability and Statistics” dari ocw.mit.edu oleh Jeremy Orlff dan Jonathan Bloom.
Kontennya sama dengan berbagai standard intro to probability for statistics & econometrics.
Left out counting and motivation because these are trivial.
Didn’t have time to go through central tendency and variance properly.
Borrow materials from Dr, Uka Wikarya.
Use english and proper notation so you can have a nice transition to VU guys.
Frequentists: probability measures the frequency of various outcomes of an experiment.
Bayesians: probability is an abstract concept that measures a state of knowledge of a degree of belief in a given preoposition.
Most of our tools are developed by frequentists, but increasingly powerful computers lead to the resurgence of bayesians.
Experiment: a repeatable procedure with well-defined possible outcomes.
sample space: the set of all possible outcomes, denoted by sometimes \(\Omega\), sometimes by \(S\).
fair: all \(\omega \in \Omega\) has the same probability.
Event: a subset of the sample space.
Probability function: a function giving the probability of each outcome.
Experiment: toss the coin, report if it lands heads or tails. Sample space: \(\Omega=\{H,T\}\). Probability function: \(P(H)=0.5, P(T)=0.5\).
Experiment: toss the coin 3 times, report outcomes. Sample space: \(\Omega=\{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT\}\) Probability function: \(P(\Omega)=\frac{1}{8} \ \forall \ \omega \in \Omega\)
Experiment: count the number of taxis that pass UI Salemba during class. Sample space: \(\Omega=\{0,1,2,3,4, \dots \}\) Probability function: Poisson distributin \(P(k)=e^{-\lambda} \frac{\lambda^k}{k!}\), where \(\lambda\) is the average number of taxis.
Outcome | 0 | 1 | 2 | 3 | \(\dots\) | k |
---|---|---|---|---|---|---|
Probability | \(e^{-\lambda}\) | \(e^{-\lambda} \lambda\) | \(e^{-\lambda} \frac{\lambda^2}{2}\) | \(e^{-\lambda} \frac{\lambda^3}{3!}\) | \(\dots\) | \(P(k)=e^{-\lambda} \frac{\lambda^k}{k!}\) |
An event \(E\) is a collection of outcomes. i.e., a subset of the sample space \(\Omega\).
For example, using 3 coin experiment, what is the probability of exactly two heads show up?
We can write it as \(E=\)’exactly 2 heads’, or \(E=\{HHT,HTH,THH\}\). Note that \(E \subset \Omega\).
Since we know that \(P(\Omega)=\frac{1}{8} \ \forall \ \omega \in \Omega\), we can compute \(P(E)=3/8\)
A discrete sample space is one that is listable, can be either finite of infinite.
\(\{H,T\}, \{1,2,3,4,5,6\}, \{1,2,3,\dots \}\) are all discrete sets. The first two are finite, the last one is infinite.
The interval \(0\leq x \leq 1\) is not discrete. It is continuous.
For a discrete sample space \(S\), a probability function \(P\) assigns to each outcome \(\omega\) a number \(P(\omega)\) called the probability of \(\omega\).
\(P\) must satisfy two rules:
Rule 1: \(0 \leq P(\omega) \leq 1\)
Rule 2: the sum of the probability of all possible outcomes is 1.
Rule 2: if \(S=\{\omega_1, \omega_2,...,\omega_n\}\), then \(\sum_{j=1}^n P(\omega_j)=1\)
\(P(E)=\sum_{\omega \in E} P(\omega) \leq 1\)
For events A, L and R contained in a sample space \(\Omega\).
Rule 1. \(P(A^c)=1-P(A)\)
Rule 2. If \(L\) and \(R\) are disjoint then \(P(L \cup R)=P(L)+P(R)\)
Rule 3. Inclusion-exclusion principle: if L and R are not disjoint (i.e., overlap), then \(P(L \cup R)=P(L)+P(R)-P(L \cap R)\)
Conditional probability answers the question ‘how does the proobability of an event change if we have extra information’?
This is called conditional probability, since it takes into account additional conditions.
Rephrase (b) as events: Let \(A\) be the event ‘all three tosses are heads’ = \(\{HHH\}\). Let \(B\) be the event ‘the first toss is heads’ = \(\{HHH,HHT,HTH,HTT\}\).
The conditional probability of A knowing that B has happened is written \(P(A|B)\).
This is read as ‘the conditional probability of A given B’, or ‘the probability of A conditioned on B’, or simply ‘the probability of A given B’.
We can assign a formal definition of conditional probability as such:
Let \(A\) and \(B\) be events. Conditional probability of A given B defined as
\[ P(A|B)=\frac{P(A \cap B)}{P(B)},\text{ provided } P(B) \neq 0 \]
Let’s redo our previous calculation of 3 heads using this @eq1. Recall that \(A=HHH\), and \(B=\)the first toss is \(H\).
\[ P(A|B)=\frac{P(A \cap B)}{P(B)}=\frac{1/8}{1/2}=\frac{1}{4} \]
For more complicated events, using @eq1 is often preferred to counting.
From @eq1 we can manipulate the algebra to get the multiplication rule:
\[ P(A \cap B)=P(A|B) \cdot P(B) \]
example: Draw two cards from a deck. Using multiplication rule, show that the chance of drawing two spades is 3/51.
Suppose the sample space \(\Omega\) is divided into 3 disjoint events \(B_1, B_2, B_3\), then for any event \(A\):
\[\begin{align*} P(A)&=P(A \cap B_1)+P(A \cap B_2)+P(A \cap B_3) \\ P(A)&=P(A|B_1)P(B_1)+P(A|B_2)P(B_2)+P(A|B_3)P(B_3) \end{align*}\]
If A is divided into 3 pieces, then \(P(A)\) is the sum of the probabilities of the pieces. The second equation is called the law of total probability.
An urn contains 5 red balls and 2 green balls. We draw 2 balls. What is the probability the second ball is red?
Sample space \(\Omega=\{rr,rg,gr,gg\}\). Let \(R_1\)=‘first ball red’, \(R_2\)=‘second ball red’, \(G_1\)=‘first ball green’, \(G_2\)=‘second ball green’. The question is \(P(R_2)\).
\[ P(R_2)=P(R_2 | R_1)P(R_1)+P(R_2|G_1)P(G_1)=\frac{30}{42} \]
under a slightly complex rule, we can’t count on counting.
Suppose if the first draw is green, a red ball is added to the urn, and if the first draw is red, a green ball is added. The first ball isn’t returned. find \(P(R_2)\).
\(P(R_2 | R_1)=4/7\), \(P(R_2|G_1)=6/7\) therefore
\[ P(R_2)=P(R_2 | R_1)P(R_1)+P(R_2|G_1)P(G_1)=\frac{32}{49} \]
Using a probability tree is useful in this type of question.
Two events are independent if knowledge that one occured does not change the probability that the other occurred.
A is independent of B if \(P(A|B)=P(A)\)
If A is independent of B, then \(P(A \cap B)=P(A|B)P(B)=P(A)P(B)\).
2 events A and B are independent if \(P(A \cap B)=P(A) \cdot P(B)\).
A is independent of B if and only if B is independent of A.
Toss a fair coin twice. \(H_1\)=‘first toss is H’ and \(H_2\)=‘second toss is H’. Are \(H_1\) and \(H_2\) independent?
Toss a fair coin 3 times. Let \(A\)=‘total 2 heads’. Are \(H_1\) and \(A\) independent? Hint: find \(P(A)\) then check if \(P(A)=P(A|H_1)\).
For two events A and B, Bayes’ rule says
\[ P(B|A)=\frac{P(A|B)\cdot P(B)}{P(A)} \]
Bayes’ rule tells us how to ‘invert’ conditional probabilities. In practice, \(P(A)\) is often computed using the law of total probability.
It is common to confuse \(P(A|B)\) and \(P(B|A)\).
Toss a coin 5 times. Let \(H_1=\)’first toss is heads’ and let \(H_A=\) ‘all 5 tosses are heads’. \(P(H_1|H_A)=1\) but \(P(H_A|H_1)=1/16\).
\[ P(H_1 | H_A)=\frac{P(H_A | H_1)P(H_1)}{P(H_A)}=\frac{1/16 \cdot 1/2}{1/32}=1 \]
Consider a routine screening test for a disease. Suppose the frequency of the disease in the population (the base rate) is 0.5%. The test is fairly accurate with 5% false positive rate and 10% false negative rate. You take the test and it comes back positive. What’s the probability you actually get the disease?
Lets define events: \(D^+=\) you have the disease, \(D^-=\)you dont get the disease, \(T^+=\) you tested positive, and \(T^-=\)you tested negative.
\(P(D+)=0.005\) therefore \(P(D-)=0.995\). The false positive and false negative are conditional probability.
\(P(\text{false positive})=P(T+|D-)=0.05\) and \(P(\text{false negative})=P(T-|D+)=0.1\)
The complements are true negative and true positive rates, which are:
\(P(T-|D-)=1-(T+|D-)=0.95\) and \(P(T+|D+)=1-P(T-|D+)=0.9\)
You can actually put this in a probability tree.
The question is what’s the probability that you have the disease that your test is positive. i.e., what is the value of \(P(D+|T+)\). We don’t have the value but we can use Bayes’ rule:
\[ P(D+|T+)=\frac{P(T+|D+) \cdot P(D+)}{P(T+)} \]
We use the law of total probability to compute \(P(T+)\) (or just use the tree)
\[ P(T+)=0.995 \times 0.05+0.005 \times 0.9=0.05425 \]
The answer is something like 8.3%.
This is called “the base rate fallacy” because the base rate of the disease in the population is so low that majority of people taking the test are actually healthy. To summarize:
95% of all tests are accurate does not imply 95% of positive tests are accurate
The base rate fallacy also often calculated using a table
D+ | D- | total | |
---|---|---|---|
T+ | \(D+ \cap T+\) | \(D- \cap T+\) | |
T- | \(D+ \cap T-\) | \(D- \cap T-\) | |
total | 50 | 9950 | 10000 |
D+ | D- | total | |
---|---|---|---|
T+ | 45 | 498 | 543 |
T- | 5 | 9452 | 9457 |
total | 50 | 9950 | 10000 |
A random variable assigns a number to each outcome in a sample space.
Let \(\Omega\) be a sample space. A discrete random variable is a function
\[ X : \Omega \rightarrow \mathbb{R} \]
that takes a discrete set of values. It’s random because its value depends on a random outcome of an experiment.
For any value \(a\) we write \(X=a\) to mean event consisting of all outcomes \(\omega\) with \(X(\omega)=a\)
Roll a fair ice twice and record the outcome as \((i,j)\) where \(i\) is the outcome of the first roll while \(j\) is the outcome of the second roll. The sample space thus
\[ \Omega=\{(1,1),(1,2),...(6,6)\}=\{(i,j)|i,j=1,...6\} \]
In this game, you win $500 if the sum is 7 and lose $100 otherwise. The payoff function X: \[ X(i,j)= \begin{cases} 500 & if \ i+j=7 \\ -100 & if \ i+j \neq 7 \end{cases} \]
The event \(X=500\) is the set \(\{(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)\}\), so \(P(X=500)=1/6\).
The probability mass function (pmf) of a discrete random variable is the function \(p(a)=P(X=a)\) that:
always satisfy \(0\leq p(a) \leq 1\)
\(a\) can be any number. for \(a\) that \(X\) never takes, then \(p(a)=0\).
Let \(\Omega\) is a sample space for rolling 2 dice. Let \(M\) be the max value of the 2 dice.
a | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
pmf p(a): | 1/36 | 3/36 | 5/36 | 7/36 | 9/36 | 11/36 |
The cummulative distribution function (cdf) of a random variable \(X\) is the function \(F\) given by \(F(a)=P(X\leq a)\).
a | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
pmf \(p(a):\) | 1/36 | 3/36 | 5/36 | 7/36 | 9/36 | 11/36 |
cdf \(F(a):\) | 1/36 | 4/36 | 9/36 | 16/36 | 25/36 | 36/36 |
\(F(a)\) is called the cumulative distribution function because \(F(a)\) gives the total probability that accumulates by adding up the probabilities \(p(b)\) as \(b\) runs from \(-\infty\) to \(a\).