Medical Statistics

Important Probability Distributions

Boncho Ku, Ph.D. in Statistics

secondmoon@kiom.re.kr

Korea Institute of Oriental Medicine (KIOM)

Korea Institute of Oriental Medicine

Important Discrete Distribution

Bernoulli Trial

Definition

A random experiment in which there are only two possible outcomes

Example

Tossing a coin: Head (H) or Tail (T)
Take an exam: Success (S) or Failure (F)
True (T) or False (F)

Let $X:\{S,F\} \rightarrow \{0,1\}$, that is

\[X = \begin{cases} 1~~~~~\mathrm{if~the~outcome~is~S}\\ 0~~~~~\mathrm{if~the~outcome~is~F}\\ \end{cases}\]

PMF
Expectation

Let $p$ be the probability of success, then the p.m.f of $X$ is

\[ f_X(x) = p^x(1-p)^{1-x}, x = \{0, 1\} \]

\[ E(X) = p, ~~ Var(X) = \sigma^2 = p(1-p) \]

Binomial Distribution

The Binomial distribution has three defining properties:

Important

Bernoulli trials are conducted $n$ times
The trials are independent
The probability of success $p$ does not change between trials

Definition

If $X$ counts the number of successes in the $n$ independent Bernoulli trials, we say that $X$ has a binomial distribution with the p.m.f

\[ P(X = x) = f_X(x) = \binom{n}{k} p^x (1-p)^{n-x}, x=\{0, 1, \ldots, n\} \]

We write

\[ X \sim \mathrm{binom}(\mathrm{size} = n,~~\mathrm{prob} = p) \]

Binomial Distribution

Important

The total number of successes is identical to the sum of $n$ Bernoulli trials.

PMF Prop
Expectation
Variance (1)
Variance (2)
Easy Proof

\[ \sum_{x=0}^{n}\binom{n}{x} p^x (1-p)^{n-x} = [p + (1-p)]^{n} = 1 \]

\[\begin{aligned} \mu & = \sum_{x=0}^{n}x\binom{n}{x}p^{x}(1-p)^{n-x} \\ & = \sum_{x=1}^{n}x\frac{n!}{x!(n-x)!}p^{x}(1-p)^{n-x} \\ & = n\cdot p\sum_{x=1}^{n}\frac{(n-1)!}{(x-1)!(n-x)!}p^{x-1}(1-p)^{n-x} \\ & = n\cdot p\sum_{x-1=0}^{n-1}\binom{n-1}{x-1}p^{x-1}(1-p)^{(n-1)-(x-1)} \\ & = np \end{aligned}\]

\[\begin{aligned} \sigma^2 &= E[X^2] - [E(X)]^2 \\ &= \sum_{x=0}^{n}x^2\binom{n}{x}p^{x}(1-p)^{n-x} - (np)^2 \\ &= \sum_{x=1}^{n}x^2\binom{n}{x}p^{x}(1-p)^{n-x} - (np)^2 \\ &= \sum_{x=1}^{n}x\cdot x \frac{n(n-1)!}{x(x-1)!(n-x)!}p^{x}(1-p)^{n-x} -(np)^2 \\ &= np\sum_{x-1=0}^{n-1}x\binom{n-1}{x-1}p^{x-1}(1-p)^{(n-1)-(x-1)} - (np)^2 \end{aligned}\]

Continued from (1)

Let $y=x-1$ and $m=n-1$, then

\[\begin{aligned} \sigma^2 &= np\sum_{y=0}^{n-1}(y+1)\binom{m}{y}p^{x-1}(1-p)^{m-y} -(np)^2\\ &= np\{(n-1)p + 1\} - (np)^2 = (np)^2 + np(1-p) - (np)^2 \\ &= np(1-p) \end{aligned}\]

Let $X_{i} \sim Bernoulli(p)$ then $Y=\sum_{i=1}^{n}X_i \sim \mathrm{binom}(n, p)$. Therefore,

\[\begin{aligned} \mu_{Y} &= E(Y) = E\left(\sum_{i=1}^{n}X_{i}\right) = \sum_{i=1}^{n}E(X_i) = \sum_{i=1}^{n}p = np \\ \sigma_{Y}^{2} &= Var(Y)=Var\left(\sum_{i=1}^{n}X_{i}\right) = \sum_{i=1}^{n}Var(X_i) \\ &= \sum_{i=1}^{n}p(1-p) = np(1-p)~\because~X_{i} \perp X_{j} \end{aligned}\]

Binomial Distribution

Example

Roll 12 dice simultaneously, and let $X$ denote the number of s appear. We wish to find the probability of getting seven, eight, or nine s.

Let $S$ be an event that we get a on one roll. Then $P(S)=1/6$ and the rolls constitute Bernoulli trials; $X \sim \mathrm{binom}(\mathrm{size}=12, \mathrm{prob} = 1/6)$ and we are asking to find $P(7\leq X\leq 9)$. That is,

\[ P(7\leq X\leq 9) = \sum_{x=7}^{9}\binom{12}{x}\left(\frac{1}{6}\right)^x\left(\frac{5}{6}\right)^{12-x} \] Alternative

\[ P(7\leq X\leq 9) = P(X \leq 9) - P(X \leq 6) = F_X(9) - F_X(6) \]

where $F_X(x)$ is the CDF of $X$.

solution: 1
solution: 2

> dbinom(7, 12, 1/6) + dbinom(8, 12, 1/6) + dbinom(9, 12, 1/6)

[1] 0.001291758

> pbinom(9, 12, 1/6) - pbinom(6, 12, 1/6)

[1] 0.001291758

Binomial Distribution

Example (CDF)

Toss a coin 3 times and let $X$ be the number of Heads observed. Then

\[ X \sim \mathrm{binom}(\mathrm{size} = 3, \mathrm{prob} = 1/2) \]

The PMF:

$x=\#~\mathrm{of~Heads}$	0	1	2	3
$f(x)=P(X=x)$	1/8	3/8	3/8	1/8

CDF
CDF Plot: Code
CDF Plot: Plot

\[F_{X}(x) = P(X \leq x) = \begin{cases} 0, ~~~~~~~~~~~~~~~~~~~~~~x < 0\\ \frac{1}{8},~~~~~~~~~~~~~~~~~~~~~ 0\leq x < 1\\ \frac{1}{8} + \frac{3}{8} = \frac{4}{8},~~~~~ 1\leq x < 2\\ \frac{4}{8} + \frac{3}{8} = \frac{7}{8},~~~~~ 2\leq x < 3\\ 1 ~~~~~~~~~~~~~~~~~~~~~~~~x \geq 3\\ \end{cases}\]

n <- 3; p <- 0.5
x <- -1:(n+1)
y <- pbinom(x, size = n, p = p)

plot(x, y, type = "n", 
     xlab = "number of successes", 
     ylab = "cumulative probability")
abline(h=1, lty=2, col = "darkgray")
abline(h=0, lty=2, col = "darkgray")
points(x[2:5], y[2:5], pch=16, cex = 2)
points(x[2:5], y[1:4], pch=21, cex = 2)
segments(
  x0 = c(-2, 0, 1, 2, 3), 
  x1 = c( 0, 1, 2, 3, 5), 
  y0 = c(y[1], y[2], y[3], y[4], y[5]), 
  y1 = c(y[1], y[2], y[3], y[4], y[5])
)

Hypergeometric Distribution

Example

Suppose that an urn contains 7 white balls and 5 black balls. Let our random experiment be to randomly select 4 balls, without replacement, from the urn. Then the probability of observing 3 white balls(and thus 1 black ball) is

\[ P(3W, 1B) = \frac{\binom{7}{3}\binom{5}{1}}{\binom{12}{4}} \]

Sample 4 times ($=K$) without replacement $\rightarrow$ a sample from POPULATION
From where?? $\rightarrow$ an urn with 7 white balls ($=M$) and 5 black balls ($=N$)

Definition

Let $X$ be the number of success in $K$ samples obtained from a total of $N$ successes and $M$ failures, the PMF of $X$ is

\[ P(X=x) = f_{X}(x) = \frac{\binom{M}{x}\binom{N}{K-x}}{\binom{M+N}{K}}, ~~~0\leq x \leq M,~0\leq K-x \leq N \]

Hypergeometric Distribution

Expectation

\[ \mu_X=E(X)=K\frac{M}{M+N} \]

Variance

\[ \sigma_{X}^2=Var(X)=K\frac{MN}{(M+N)^2}\frac{M+N-K}{M+N-1} \]

Example

Suppose that a city with $N$ women and $M$ men.

$X$: the number of women infected with the disease
$Y$: the number ofmen infected with the disease

Our goal is to estimate the probability that a person infected with the disease is women. Let $p_X$ and $p_Y$ denote probabilities of infection. Assume that $p = p_X = p_Y$

	Women	Men	Total
Infection: Yes	$X$	$Y$	$X+Y$
Infection: No	$N-X$	$M-Y$	$(N+M)-(X+Y)$
Total	$N$	$M$	$N+M$

Hypergeometric Distribution

Example (continued)

Then $X\sim \mathrm{binom}(\mathrm{size}=N, \mathrm{prob}=p)$, $Y\sim \mathrm{binom}(\mathrm{size}=M, \mathrm{prob}=p)$. Thus,

\[ (X+Y) \sim \mathrm{binom}(\mathrm{size}=N+M, \mathrm{prob} = p)~~\because X\perp Y \] Our purpose is to find

\[ P(X=x|X+Y) = \frac{P(X+Y|X=x)P(X=x)}{P(X+Y)}~~\because \mathrm{Bayes~Theorem} \] Let $X+Y=K$

\[\begin{aligned} P(X=x|X+Y) &= \frac{P(Y=K-X|X=x)P(X=x)}{P(X+Y=K)} \\ &= \frac{P(Y=K-X)P(X=x)}{P(X+Y=K)}~~\because X\perp Y \\ &= \frac{\binom{M}{K-X}p^{K-X}(1-p)^{M-K+X} \binom{N}{X}p^X(1-p)^{N-X}}{\binom{N+M}{K} p^K(1-p)^{N+M-K}} \\ &= \frac{\binom{M}{K-X}\binom{N}{X}}{\binom{N+M}{K}} \sim \mathrm{Hyper}(M, N, K) \end{aligned}\]

Hypergeometric Distribution

Relationship to Binomial Distribution

The conditional distribution of $X$ given $X+Y=K$ is identical to hypergeometric distribution.
When $N \rightarrow \infty$, the hypergeometric distribution converges to the binomial distribution.

Important

Binomial distribution $\rightarrow$ with replacement
Hypergeometric distribution $\rightarrow$ without replacement

Poisson Distribution

Motivation

How do we measure the probability of a given number of events occurring in a fixed interval of time or space?

Examples

Call center: number of calls per hour within a day
Traffic accident: number of traffic accidents during 7:00 AM to 10:00 AM per day
Number of customers arriving in a bank between 9:00 AM and 11:00 AM

Assumption

The events occurs independently (independence)
The events occurs with a known constant mean rate (consistency)
The probability of two events occurring in a very short time and very small space is 0 (non-clustering)

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.

Independence: The results occurring in a particular unit of time or space are independent of the results that occur in other non-overlapping units of time or space. For example, the number of customers who visited Bank A from 9:00 to 10:00 and the number of customers who visited Bank B during the same time are independent of each other. Furthermore, the number of customers who visited Bank A until 10:00 and the number of customers who visited Bank C from 10:00 to 11:00 are also independent of each other.

Consistency: If an event occurs with mean rate of 2 within a minute, it implies 6 events will occur during 3 minutes.

Non-clustering: The probability of two events occurring in a very short time and very small space is 0. For example, the probability of two customers arriving at the bank at the same time is 0.

Poisson Distribution

Definition

Let $\lambda$ ($\lambda > 0$) be the average number of events in the time interval $[0,1]$. Let the discrete random variable $X$ count the number of events occuring in the interval.

\[ f_X(x) = P(X=x) = \frac{\lambda^x\exp(-\lambda)}{x!}, ~~~x = 0,1,2,\ldots \]

We write $X \sim \mathrm{pois}(\mathrm{lambda} = \lambda)$

If $X$ counts the number of events in the interval $[0, t]$ and $\lambda$ is the average number of occurance in unit time,

\[ f_X(x) = P(X=x) = \frac{(\lambda t)^x\exp(-\lambda t)}{x!}, ~~~x = 0,1,2,\ldots \]

\[ E(X) = \sum_{x=0}^{\infty}x \frac{\lambda^x\exp(-\lambda)}{x!} = \sum_{x=0}^{\infty} \lambda \frac{\lambda^{(x-1)}\exp(-\lambda)}{(x-1)!} = \lambda \]

\[ Var(X) = \sum_{x=0}^{\infty}x^2 \frac{\lambda^x\exp(-\lambda)}{x!} = \sum_{x=0}^{\infty} \lambda^2 \frac{\lambda^{(x-2)}\exp(-\lambda)}{(x-2)!} = \lambda^2 \]

Shapes of Poisson Distribution according to $\lambda$

Poisson Distribution

Example 1

On the average, 5 cars arrive at a particular car wash every hour. Let $X$ count the number of cars that arrive from 10 AM to 11 AM. What is the probability that no car arrives during this period?

Example 2

Suppose the car wash above is in operation from 8 AM to 6 PM, and we let $Y$ be the number of customers that appear in this period. What is the probability that there are between 48 and 50 customers, inclusive?

Sol: Ex 1
Sol: Ex 2

Since $X \sim \mathrm{pois}(\lambda = 5)$, the probability that no car arrives is

\[ P(X = 0) = \frac{5^0\exp(-5)}{0!} = \exp(-5) \approx 0.0067 \]

scripts for the solution

> dpois(0, lambda = 5)

[1] 0.006737947

The average car arrivals during 8 AM to 6 PM is $\lambda'=5\times10=50$, $Y\sim \mathrm{pois}(\lambda = \lambda'=50)$. The probability between 48 to 50 customers:

\[ P(48 \leq Y \leq 50) = \sum_{y=0}^{50}\frac{50^{y}\exp(-50)}{y!} - \sum_{y=0}^{47}\frac{50^{y}\exp(-50)}{y!} \approx 0.168 \]

scripts for the solution

> ppois(50, lambda = 50) - ppois(47, lambda = 50)

[1] 0.1678485

Poisson Distribution

Relationship to Binomial Distribution

Suppose $X$ and $Y$ are Poisson random variables: $X\sim \mathrm{pois}(\lambda_1)$, $Y\sim \mathrm{pois}(\lambda_2)$ and $X$ and $Y$ are independent, then $(X+Y)\sim \mathrm{pois}(\lambda_{1} + \lambda_{2})$. The conditional distribution of $X$ given with $(X+Y) = N$ is identical to the binomial distribution with size of $N$ and probability of $p=\lambda_1/(\lambda_1 + \lambda_2)$.

Proof

By using Bayes Theorem,

\[ P(X|X+Y = N) = \frac{P(X+Y=N|X=x)P(X=x)}{P(X+Y = N)} \]

Since $X \perp Y$, $Y=X-N$,

\[ P(X|X+Y = N) = \frac{P(Y=N-X)P(X=x)}{P(X+Y = N)}=\frac{\frac{\exp(-\lambda_2)\lambda_2^{N-x}}{(N-x)!}\frac{\exp{(-\lambda_1)}\lambda_1^{x}}{x!}}{\frac{\exp[-(\lambda_1+\lambda_2)](\lambda_1+\lambda_2)^N}{N!}} \]

Poisson Distribution

Proof (continued)

\[\begin{aligned} P(X|X+Y = N) &= \frac{N!}{x!(N-x)!}\frac{\exp{[-(\lambda_1+\lambda_2)]}\lambda_1^x\lambda_2^{N-x}}{\exp{[-(\lambda_1+\lambda_2)]}(\lambda_1+\lambda_2)^N} \\ &= \binom{N}{x}\frac{\lambda_1^x\lambda_2^{N-x}}{(\lambda_1 + \lambda_2)^N} = \binom{N}{x}\frac{\lambda_1^x\lambda_2^{N-x}}{(\lambda_1 + \lambda_2)^x(\lambda_1 + \lambda_2)^{N-x}} \\ &= \binom{N}{x}\left(\frac{\lambda_1}{\lambda_1+\lambda_2}\right)^x\left(1-\frac{\lambda_1}{\lambda_1+\lambda_2}\right)^{N-x} \\ &= \mathrm{binom}\left(\mathrm{size}=N, \mathrm{prob}=\frac{\lambda_1}{\lambda_1+\lambda_2}\right) \end{aligned}\]

Important

The conditional distribution of $X$ given $X+Y$ is identical to the binomial distribution, where $X\sim \mathrm{pois}(\lambda_1)$ and $Y\sim \mathrm{pois}(\lambda_2)$
(Check) The binomial distribution is converge to the poisson distribution when $N\rightarrow \infty$

Important Continuous Distributions

Uniform Distribution

Definition

When a random variable $X$ with identical probability over every interval on $[a,b]$, it is called a uniform distribution

We write $X \sim U(a, b)$ or $X\sim \mathrm{unif}(\mathrm{min} = a, \mathrm{max}=b)$

PDF
Shape
CDF
Expectation
Variance

\[f_X(x) = \begin{cases} \frac{1}{b-a},~~ a\leq x \leq b \\ 0, ~~~~~~~ \mathrm{otherwise} \end{cases}\]

\[F_X(x) = \begin{cases} 0, ~~~~~~~ x < a\\ \frac{x-a}{b-a}, ~~ a\leq x \leq b\\ 1, ~~~~~~~ x > b \end{cases}\]

\[\begin{aligned} \mu_X = E(X) &= \int_{-\infty}^{\infty} x f_X(x) dx \\ &= \int_{a}^{b} x \frac{1}{b-a} dx \\ &= \frac{1}{b-a}\frac{x^2}{2} \bigg|_{x=a}^{b} \\ &= \frac{b+a}{2} \end{aligned}\]

\[\begin{aligned} \sigma^2_X &= Var(X) = E\left[(X - E(X))^2\right] = E(X^2) - \left[E(X)\right]^2 \\ &= \int_{a}^{b} x^2 \frac{1}{b-a} dx - \left(\frac{b+a}{2}\right)^2 \\ &= \frac{1}{b-a}\frac{x^3}{3}\bigg|_{x=a}^{b} - \left(\frac{b+a}{2}\right)^2 = \frac{b^3-a^3}{3(b-a)} - \left(\frac{b+a}{2}\right)^2\\ &= \frac{(b-a)^2}{12} \end{aligned}\]

Uniform Distribution

Example

Suppose that buses arrive at a bus stop every 10 minutes and that the waiting time of a person who arrives at the stop is uniformly distributed. We want to find the probability that a person waits less than 5 minutes.

Solution

Let the waiting time for bus be a random variable $X$, then $X$ follows the uniform distribution with interval $[0, 10]$, Therefore,

\[ f_X(x) = P(X = x) =\frac{1}{10}, ~~ 0\leq x \leq 10 \]

The probability that a person waits less tan 5 minutes can be obtained

\[ P(X\leq 5) = \int_{0}^{5}\frac{1}{10} dx = \frac{1}{10}x\bigg|_{x=0}^{5} = 0.5 \]

Check

punif(5, min=0, max=10) - punif(0, min=0, max=10)

[1] 0.5

Normal Distribution

The most IMPORTANT probability distribution in statistics
a.k.a Gaussian Distribution
Widely used to represent real-valued random variables whose distributions are unknown
Particularly, the normal distribution is a backbone of the central limit theorem (CLT)

PDF
Expectation and Variance

\[ f_X(x; \mu, \sigma^2)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\},~~ -\infty < x < \infty \]

\[ \int_{-\infty}^{\infty} \frac{1}{\sigma\sqrt{2\pi}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\} = 1 \]

\[\begin{aligned} \mu_X &=E(X) = \mu \\ \sigma^2_X &= Var(X) = \sigma^2 \end{aligned}\]

Normal Distribution

Shapes

Bell shaped curve and symmetric around the $\mu$
mean = median = mode
Depends on $\mu$ and $\sigma$

Basic shape
$\mu_1=\mu_2$, $\sigma_1 < \sigma_2$
$\mu_1<\mu_2$, $\sigma_1 = \sigma_2$
$\mu_1<\mu_2$, $\sigma_1 < \sigma_2$

Normal Distribution

Standard Normal Distribution (PDF)

When $\mu = 0$ and $\sigma = 1$, the normal distribution is called the standard normal distribution

\[ \phi(z) = \frac{1}{\sqrt{2\pi}}e^{-\frac{z^2}{2}}, ~~~ -\infty < z < \infty \] CDF of the standard normal distribution: $\Phi(z)$

\[ \Phi(z) = \int_{-\infty}^{z} \phi(t)dt \]

Proposition

\[ Z = \frac{X - \mu}{\sigma} \sim N(0, 1) \]

Normal Distribution

68-95-99 Rule

Approximately 68% of the data falls within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations.

Check

pnorm(1:3) - pnorm(-c(1:3))

[1] 0.6826895 0.9544997 0.9973002

Example

Let the random experiment consist of a person taking an IQ test, and let $X$ be the score on the test. The scores on such a test are typically standardized to have a mean of 100 and a standard deviation of 15. What is the probability $P(85\leq X \leq 115)$?

Manual Solution
Check

\[ \begin{aligned} P(85\leq X \leq 115) &= P\left(\frac{85-115}{15} \leq \frac{X-100}{15} \leq \frac{100-115}{15}\right)\\ &= P\left(-1 \leq Z \leq 1\right) \\ &= \Phi(1) - \Phi(-1) = 0.6827 \end{aligned} \]

zl <- (85 - 100)/15
zu <- (115 - 100)/15
pnorm(zu) - pnorm(zl)

[1] 0.6826895

Other Continuous Distributions

Exponential Distribution

Motivation

How to model the time between events that occur continuously and independently at a constant average rate (waiting time)?

Examples

Call center: time between message transmission
Finance: time between market price change
Manufacturing: time between machine failure

What we need?

Capture the uncertainty and randomness of the event occurance
Assumption of memoryless property: the probability of the event occurring in the next time interval is independent of the past

The exponential distribution is widely used to model the time between events that occur continuously and independently at a constant average rate. This distribution is closely linked with the Poisson process, where the focus is on modeling the occurrence of rare events over time.

The exponential distribution was developed in response to the need to model situations where we are interested in the waiting time until the next event occurs. For example, consider the problem of predicting how long you will wait for a bus to arrive, assuming the buses arrive randomly. If we know that the buses come on average every 10 minutes, but we have no further information about the schedule, how can we model the waiting time for the next bus?

This kind of problem also appears in various fields such as telecommunications (time between message transmissions), finance (time between price changes), and reliability engineering (time until a product fails). In each of these cases, we are interested in understanding the distribution of time intervals between consecutive occurrences of events.

Such a situation demands a probability distribution that: 1. Captures the uncertainty and randomness of the event occurrence. 2. Does not “remember” how much time has already passed, meaning that the probability of an event happening in the future is independent of the past.

The exponential distribution satisfies both of these requirements and is particularly useful for modeling these scenarios. It provides a mathematically simple yet powerful model for describing the time between events in a wide range of processes.

Exponential DIstribution

Definition

A random variable $X$ has an exponential distribution and write $X \sim \exp(\lambda)$

PDF
CDF
Expectation and Variance
Shape

\[ f_X(x) = \lambda \exp(-\lambda x),~~~~x\geq0 \] where

$x$: The time until the next event occurs;
$\lambda$: The rate at which events occur

\[ F_X(x) = 1 - \exp(-\lambda x),~~~~x\geq0 \]

\[\begin{aligned} \mu_X &=E(X) = \frac{1}{\lambda} \\ \sigma^2_X &= Var(X) = \frac{1}{\lambda^{2}} \end{aligned}\]

Exponential Distribution

Relationship to Poisson Distribution

Poisson process: Events occurring randomly in time or space at a constant average rate $\lambda$.

Poisson distribution: how many events occur in a given interval.
Exponential distribution: time between consecutive events in this process.

Example

Suppose that customers arrive at a store according to a Poisson process with rate $\lambda$.

$Y$: a random variable representing the number of customers arriving in a given time interval $[0,t)$ $\rightarrow$ $Y \sim \text{Pois}(\lambda t)$
$X$: a random variable representing the length of time interval $\rightarrow$ $X \sim \exp(\lambda)$

1. 포아송 분포

포아송 분포는 일정한 시간 또는 공간 내에서 발생하는 사건의 수를 모델링합니다. 사건이 독립적으로 발생하며, 일정한 평균 속도 $\lambda$로 발생한다고 가정합니다. 포아송 분포의 확률 질량 함수(PMF)는 다음과 같습니다:

$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \dots$

여기서 $X$는 주어진 구간에서 발생하는 사건의 수입니다. - $\lambda$는 사건이 발생하는 평균 속도(단위 시간당 사건 수)를 나타냅니다.

1.2 지수분포

지수분포는 포아송 과정에서 연속된 사건 사이의 시간을 모델링합니다. 사건들이 일정한 속도 $\lambda$로 발생한다고 가정할 때, 다음 사건이 발생하기까지의 시간을 설명하는 분포입니다. 지수분포의 확률 밀도 함수(PDF)는 다음과 같습니다:

\[f(x; \lambda) = \lambda e^{-\lambda x}, \quad x \geq 0\]

여기서: - $X$는 다음 사건이 발생할 때까지의 시간입니다. - $\lambda$는 사건이 발생하는 속도(rate)입니다.

2. 지수분포와 포아송 분포의 관계

지수분포와 포아송 분포 간의 관계는 포아송 과정을 통해 자연스럽게 도출됩니다. 핵심 내용은 다음과 같습니다:

포아송 과정:
- 포아송 과정은 사건들이 시간 또는 공간 내에서 일정한 평균 속도 $\lambda$로 무작위로 발생하는 과정을 설명합니다.
- 포아송 분포는 주어진 구간에서 발생하는 사건의 수를 설명합니다.
- 지수분포는 연속된 사건 사이의 시간을 설명합니다.
연결:
- 주어진 시간 구간에서 발생하는 사건 수가 포아송 분포를 따른다면, 연속된 사건들 사이의 시간(즉, 도착 간격)은 지수분포를 따릅니다.
- 반대로, 사건들 간의 도착 간격이 지수분포를 따른다면, 주어진 구간에서 발생하는 사건 수는 포아송 분포를 따릅니다.

3. 예시: 고객 도착

은행에 도착하는 고객을 예로 들어보겠습니다.

포아송 분포: 고객들이 은행에 평균적으로 시간당 3명($\lambda = 3$)씩 도착한다고 가정합니다. 이때 1시간 동안 도착하는 고객 수는 속도 $\lambda = 3$인 포아송 분포를 따릅니다.

\[P(X = k) = \frac{3^k e^{-3}}{k!}, \quad k = 0, 1, 2, \dots\]

여기서 $P(X = k)$는 정확히 $k$명의 고객이 1시간 동안 도착할 확률을 나타냅니다.

지수분포: 연속된 두 고객의 도착 시간 간격은 속도 $\lambda = 3$인 지수분포를 따릅니다. 두 고객의 도착 시간 간격 $T$에 대한 확률 밀도 함수(PDF)는 다음과 같습니다:

\[f(t) = 3 e^{-3t}, \quad t \geq 0\]

여기서 $f(t)$는 두 고객 사이의 도착 시간 간격이 $t$일 확률을 나타냅니다.

4. 두 분포의 연결:

포아송 과정에서는 고객이 독립적으로 도착하며, 두 도착 시간 사이의 간격은 기억 없음(memoryless) 속성을 갖습니다. 이 속성은 지수분포의 특징입니다.
지수분포는 포아송 과정에서 다음 사건(고객 도착)까지의 대기 시간을 설명합니다. 반면, 포아송 분포는 일정한 시간 내에 발생하는 사건(고객 도착)의 수를 설명합니다.

5. 수학적 연결

이벤트 사이의 시간이 지수분포를 따른다면, 주어진 시간 구간 $t$내에 적어도 하나의 사건이 발생할 확률은 지수분포의 누적 분포 함수(CDF)로 표현됩니다:

\[P(T \leq t) = 1 - e^{-\lambda t}\]

이 식은 대기 시간 $T$가 $t $이하일 확률을 의미합니다. 반면, 포아송 분포는 주어진 시간 구간 $t $내에 $k$개의 사건이 발생할 확률을 다음과 같이 나타냅니다:

\[P(N(t) = k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}\]

따라서, 사건 간 대기 시간이 지수분포를 따른다면, 시간 구간 $t$내의 사건 수는 평균 $\lambda t$를 따르는 포아송 분포를 따릅니다.

	Women	Men	Total
Infection: Yes	\(X\)	\(Y\)	\(X+Y\)
Infection: No	\(N-X\)	\(M-Y\)	\((N+M)-(X+Y)\)
Total	\(N\)	\(M\)	\(N+M\)

Medical Statistics

Important Discrete Distribution

Bernoulli Trial

Definition

Binomial Distribution

Definition

Binomial Distribution

Binomial Distribution

Example

Binomial Distribution

Example (CDF)

Hypergeometric Distribution

Definition

Hypergeometric Distribution

Hypergeometric Distribution

Hypergeometric Distribution

Relationship to Binomial Distribution

Poisson Distribution

Examples

Assumption

Poisson Distribution

Definition

Poisson Distribution

Shapes of Poisson Distribution according to \(\lambda\)

Poisson Distribution

Poisson Distribution

Relationship to Binomial Distribution

Poisson Distribution

Important Continuous Distributions

Uniform Distribution

Definition

Uniform Distribution

Normal Distribution

Normal Distribution

Shapes

Normal Distribution

Standard Normal Distribution (PDF)

Proposition

Normal Distribution

Other Continuous Distributions

Exponential Distribution

Examples

What we need?

Exponential DIstribution

Definition

Exponential Distribution

Relationship to Poisson Distribution