# Foundational Statistics

• Expectation, Variance, Probability Distributions, Central Limit Theorem

## Project Summary

### Abstract

Statistics is the science of extracting information intelligently from data.

In this project we examine the conceptual foundations so you can start thinking about the world, from a statistical framework. We also showcase the real-world applications of statistics, such as beating your friends when thinking about sports events, breakdown risks in large corporations, predicting the stock market, controlling robots, and on.

We begin with the basics of expectation and variance. From there, we jump straight into probability distributions with an example after every definition. Finally, we finish off with arguably the most important topics in statistics, the Normal Distribution and the Central Limit Theorem.

The structure of each section is broken down by the simple method of definition followed by an example. It is okay to read the definition portion and feel as if you understood none of it. That's normal. You learn by doing, and that is what the examples are for. We encourage you to read the question multiple times and try to find a pattern between the given information in the problem and what the definition is saying.

• Amil Khan
• Shon Inouye

# 1. Expectation

The expectation, also know as the expected value, is essentially the mean (or average) of a random variable. In this context, a random variable is a variable whose value is subject to variation due to its probability. If you were to run a probability experiment with a random variable multiple times and keep track of the results, the expectation would be the average of all the values obtained.

On its own, not much can be drawn from expectation. However, it is the backbone for real world statistical techniques.

The formula for the expected value of $$X$$ is defined to be, $\mathbb{E} [X] = \sum_x x \cdot p(x)$

Here, we see that the expected value of a random variable $$X$$ is the sum of the products of the values of the random variable (denoted by $$x$$) and their probabilities (denoted by $$p(x)$$ ).

Example: Let's say I am holding a lottery where I sell 1000 tickets for a contest at $5. One lucky winner will be cashing in$1000. In addition, 5 out of the 1000 people who bought tickets will receive $100 and 50 out of the 1000 people will win$50. Everyone else, unfortunately, goes home empty handed. If you buy a ticket, what are the chances that you will win money?

First, we list the possible values given and their distributions. We see this in the graph below:

Outcome Net Gain
Probability
Grand Prize $$1000$$ $$\dfrac{1}{1000}$$
Second Prize $$100$$ $$\dfrac{5}{1000}$$
Third Prize $$50$$ $$\dfrac{50}{1000}$$
No Prize (Losers) $$-5$$ $$\dfrac{944}{1000}$$

$$\textbf{Solution:} \\$$ Now, we use the equation mentioned earlier to calculate the expectation.

\begin{align} E[X] = &$1000 \times \left( \dfrac{1}{1000} \right) \\ &+$100 \times \left( \dfrac{5}{1000} \right) \\ &+ $50 \times \left( \dfrac{50}{1000} \right) \\ &+ (-$5) \times\left( \dfrac{944}{1000} \right) \\ E[X] = &-0.72 \end{align} The expected value is a negative value. What this means is that you are expected to lose money if you bought a ticket. # 2. Variance Let's say we want to know how the data varies. The most common measure of variability used in statistics is the variance, which is a function of the deviations (or distances) of the sample measurements from their mean. One of the main reasons why we care about variance is because we can use it to calculate the standard deviation. The standard deviation shows how much your data is spread out from the mean and is much more useful in describing a spread of data than variance. The formula for the variance of $$X$$ is defined to be, $$Var[X]=E[X^2]-(E[X])^2$$ To find the standard deviation, we simply take the square root the variance. Hence, $$\sigma = \sqrt {Var[X]}$$ Example: We can use the lottery example from the Expectation section to calculate the variance. Let's take a look at the table of values and probabilities: Outcome Net Gain Probability Grand Prize $$1000$$ $$\dfrac{1}{1000}$$ Second Prize $$100$$ $$\dfrac{5}{1000}$$ Third Prize $$50$$ $$\dfrac{50}{1000}$$ No Prize (Losers) $$-5$$ $$\dfrac{944}{1000}$$ We know from the previous section that, $$E[X] = -0.72$$ And so we can calculate, $$(E[X])^2 = 0.5184$$ We also need to know the expected value of $$X$$ squared. We find this by squaring the values in the equation: \begin{align} E[X] = &1000^2 \times \left( \dfrac{1}{1000} \right) \\ &+ $100^2 \times \left( \dfrac{5}{1000} \right) \\ &+$50^2 \times \left( \dfrac{50}{1000} \right) \\ &+ (-5)^2 \times\left( \dfrac{944}{1000} \right) \\ E[X^2] = &1198.6 \end{align} We can now plug in these values to the equation for variance: \begin{align} Var[X] &=E[X^2]-(E[X])^2 \\ Var[X] &=1198.6-0.5184 \\ Var[X] &=1198.0816 \\ \end{align} Now that we know the variance, we can also calculate the standard deviation: \begin{align} \sigma &= \sqrt {1198.0816} \\ \sigma &=34.61 \end{align}

This means that although the expected value is -$0.72, you can expect to have a net gain between -$35.33 and $33.89. We know that between these values, you can only have a net loss of$5. Winning either $50,$100, or \$1000 is not within the standard deviation, so we can assume that it is not likely that you will win anything.

# 3. Probability Distribution

Probability distributions are represented by a function, table, or graph, and they assign probabilities to each value of a random variable. Looking at probability distributions allow you to get a better understanding of how likely certain events will occur.

There are many different types of probability distributions, but they all can be classified as either discrete or continuous. If a variable is a discrete variable, it has a discrete probability distribution, and if it is a continuous variable, it has a continuous probability distribution. A variable is continuous if it can take on any value between two specific values; otherwise, it is discrete.

Example 1: Suppose that the fire department requires that all firefighters be between 140 and 240 pounds. Since the firefighter's weight can take on any value between two specific values (in this case, 140 and 240), this means that weight is a continuous variable.

Example 2: Now suppose we take the case of flipping a coin and counting the number of times it lands on heads. We know that it is not possible to get a number such as 2.5 heads. This shows that we are not able to get any number between two specific values (in this case, 0 and infinity), so the number of heads must be a discrete variable.

In the following sections, you will learn about many different probability distributions and examples on how to apply them.

# 4. Binomial

One type of probability distribution that you might find useful is the Binomial distribution. This distruibution is used when an experiement has the following properties:

• The experiment consists of a fixed number, $$n$$, of idential trials.
• Each trial results in one of two outcomes: success or failure.
• The probability of success on a single trial is equal to some value $$p$$ and remains the same from trial to trial. The probability of a failure is equal to $$q = (1 - p)$$.
• Trials are independent. This means that the probability of one trial is not affected by the result of another trial.
• The random variable of interest is $$X$$, the number of successes over $$n$$ trials.

You are able to use the Binomial distribution in a number of different situations. You can use it when flipping coins, where the result is either heads or tails. You can use it in a manufacturing production line, where a product is either defective or not defective. You can even use it when rolling dice, where the result is either even or odd. So long as the situation follows the characteristics listed above, a Binomial distribution can be created.

The equation that we use to find the probabilities for a Binomial distribution is: $$p(x) = \binom n x \cdot p^x \cdot q ^{ (n-x)}$$ Note that: $$\binom n x = \dfrac{(n!)}{(x! \cdot (n-x)!)}$$

We also know the expected value and variance for the random variable $$X$$: $$\mathbb{E} [X] = \mu = n \cdot p$$ $$Var[X] = \sigma^2 = n \cdot p \cdot q$$

Example: Let's assume that you somehow know your exact probability of making a free throw in basketball. Unfortunately, you're not very good at basketball, so the probability that you make a free throw is 0.6 or 60%. Therefore, the probability that you'll miss a free throw is 0.4 or 40%. Now, let's say that you want to know the probability distribution of scoring $$x$$ times in 6 trials.

We now know the following information:

• $$x$$ will be the number of times scored: 0, 1, 2, 3, 4, 5, 6
• $$n$$ = 6
• $$p$$ = 0.6
• $$q$$ = 0.4

Plugging in the values of $$n$$, $$p$$, and $$q$$, into the equations for expected value and variance gives us: \begin{align} \mathbb{E} [X] &= n \cdot p \\ \mathbb{E} [X] &= (6) \cdot (0.6) \\ \mathbb{E} [X] &= 3.6 \end{align} and \begin{align} Var[X] &= n \cdot p \cdot q \\ Var[X] &= (6) \cdot (0.6) \cdot (0.4) \\ Var[X] &= 1.44 \end{align}

The values for $$x$$ will be plugged in to our probability equation for binomial distributions: $$p(x) = \binom n x p^xq ^{(n-x)}$$

Doing so results in the following probabilities: \begin{align} p(0) &= \binom 6 0 \cdot (0.6)^0 \cdot (0.4) ^ {(6-0)}\\ &= 0.004096 \\ \\ p(1) &= \binom 6 1 \cdot (0.6)^1 \cdot (0.4) ^ {(6-1)} \\ &= 0.036864\\ \\ p(2) &= \binom 6 2 \cdot (0.6)^2 \cdot (0.4) ^ {(6-2)} \\ &= 0.13824\\ \\ p(3) &= \binom 6 3 \cdot (0.6)^3 \cdot (0.4) ^ {(6-3)} \\ &= 0.27648\\ \\ p(4) &= \binom 6 4 \cdot (0.6)^4 \cdot (0.4) ^ {(6-4)} \\ &= 0.31104\\ \\ p(5) &= \binom 6 5 \cdot (0.6)^5 \cdot (0.4) ^ {(6-5)} \\ &= 0.186624\\ \\ p(6) &= \binom 6 6 \cdot (0.6)^6 \cdot (0.4) ^ {(6-6)} \\ &= 0.046656\\ \end{align}

This shows the distributions of probabilities for making $$x$$ out of 6 shots. In a histogram, we see this distribution to be similar to a bell curve.

# 5. Geometric

Another type of probability distribution is the Geometric distribution. This distribution is very similar to the Binomial distribution with respect to its properties:

• The experiment consists of a fixed number, $$n$$, of idential trials.
• Each trial results in one of two outcomes: success or failure.
• The probability of success on a single trial is equal to some value $$p$$ and remains the same from trial to trial. The probability of a failure is equal to $$q = (1 - p)$$.
• Trials are independent. This means that the probability of one trial is not affected by the result of another trial.

However, there is one key difference between the two:

• The random variable of interest is $$X$$, the number of the trial on which the first success occurs.

You are able to use the Geometric distribution in all of the same situations as the Binomial distribution, so long as your goal is to find the probability that the first success occurs on a certain trial (rather than finding the probability of successes over a number of trials).

The equation that we use to find the probabilities for a Geometric distribution is: $$p(x) = p \cdot q^{(x-1)}$$

We also know the expected value and variance for the random variable $$X$$: $$\mathbb{E} [X] = \mu = \dfrac{1}{p}$$ $$Var[X] = \sigma^2 = \dfrac{1-p}{p^2}$$

Example: Let's take the same example used in the Binomial distribution section. We will assume that you still somehow know your exact probability of making a free throw in basketball and you still aren't very good at it. The probability that you make a free throw is still 0.6 (or 60%) and the probability that you'll miss a free throw is still 0.4 (or 40%).

This time, we want to know the probability distribution of having our first successful free-throw be trial number $$x$$ with a limit of 6 trials.

We now know the following information:

• $$x$$ will be the trial on which the first free-throw is made: 1, 2, 3, 4, 5, 6
• $$n$$ = 6
• $$p$$ = 0.6
• $$q$$ = 0.4

Plugging in the value $$p$$ into the equations for expected value and variance gives us: \begin{align} \mathbb{E} [X] &= \dfrac{1}{p} \\ \mathbb{E} [X] &= \dfrac{1}{0.6} \\ \mathbb{E} [X] &= 1.67 \end{align} and \begin{align} Var[X] &= \dfrac{1-p}{p^2} \\ Var[X] &= \dfrac{1-0.6}{0.6^2} \\ Var[X] &= 1.11 \end{align}

The values for $$x$$ will be plugged in to our probability equation for binomial distributions: $$p(x) = pq^{(x-1)}$$

Doing so results in the following probabilities: \begin{align} p(1) &= (0.6) \cdot (0.4)^{(1-1)} \\ &= 0.6 \\ \\ p(2) &= (0.6) \cdot (0.4)^{(2-1)} \\ &= 0.24 \\ \\ p(3) &= (0.6) \cdot (0.4)^{(3-1)} \\ &= 0.096 \\ \\ p(4) &= (0.6) \cdot (0.4)^{(4-1)} \\ &= 0.0384 \\ \\ p(5) &= (0.6) \cdot (0.4)^{(5-1)} \\ &= 0.01536 \\ \\ p(6) &= (0.6) \cdot (0.4)^{(6-1)} \\ &= 0.006144 \\ \end{align}

This shows the distributions of probabilities for trial $$x$$ being the first success out of 6 trials. In a histogram, we see this distribution to be an exponential decline.

# 6. Exponential

Let's say I want to model the time between the first made basket and the second made basket from the Lakers vs. Warriors game. Well, the first bit of information I need is an event rate (or rate parameter to be statistically correct). Next, I have to make sure that some rules are satisfied, or else I may have to use an entirely different distribution. Let's glance at those rules:

1. $$X$$ is the time (or distance) between events, with $$X$$ > 0.

2. Events occur independently.

3. The rate at which events occur is constant.

4. Two events cannot occur at exactly the same instant.

It seems like we are in the clear. Now, how the heck do I model such an event? There is a formula. $$f(x)= \lambda e^{- \lambda x}$$

Let's break down this formula,

$$f(x)$$ : The density function is 0 for x less than or equal to 0
$$\lambda$$ : Lambda is our Rate Parameter
$$e^{-\lambda x}$$ : e is raised to the rate parameter multiplied by x < 0

$$\text{Expected Value} \quad =\dfrac{1}{\lambda}$$ $$\text{Variance} \qquad = \dfrac{1}{\lambda^2}$$

Example: Let's say, on average, that the time between the first made shot and the second made shot at a Lakers vs. Warriors game is 34 seconds. State $$\lambda$$ and find the time between the shots.

Solution: \begin{align} \lambda = 34 \\ \end{align} So the time between 1st & 2nd shot is: $$34e^{-34x}$$

Now lets say it has been a whole minute and neither team has scored. What is the expected time of the next made basket when $$\lambda$$ = 40. $$E[\text{A} | \text{B}] \\$$ Where,

B: Average time between made shots

Thus, \begin{align} 60 + \dfrac{1}{\lambda} \\ 60 + \dfrac{1}{40}\\ =1.525 \ \text{minutes} \end{align}

# 7. Poisson

The Poisson distribution often provides a good model for the probability distribution of the number $$X$$ of rare events that occur in space, time, volume, or any other dimension, where $$\lambda$$ is the average value of $$X$$.

The equation that we use to find the probabilities for a Poisson distribution is: $$p(x) = \dfrac{\lambda^x}{x!} \cdot e^{-\lambda}$$

Example: Let's say you are keeping track of the number of soccer goals scored in each game by every high school team in a specfic region over the course of a season. Here is the data you collected:

Number of Goals Count
0 26
1 36
2 22
3 11
4 3
5 1
6 0
7 0
8 1

There are 100 games in total.

The total number of goals is given by,

\begin{align} Total = &(0 \cdot 26) + (1 \cdot 36) \\ &+ (2 \cdot 22) + (3 \cdot 11) \\ &+ (4 \cdot 3) + (5 \cdot 1) \\ &+(6 \cdot 0) + (7 \cdot 0) \\ &+ (8 \cdot 1) \\ = &133 \end{align}

So the average number of goals for the 100 games is given by: $$\lambda = \dfrac{133}{100}$$

We now know the following information:

• $$x$$ will be the number of goals scored in a game: 0, 1, 2, 3, 4, 5, 6, 7, 8
• $$\lambda$$ = 1.33

The values for $$x$$ and $$\lambda$$ will be plugged into our probability equation for Poisson distributions: $$p(x) = \dfrac{\lambda^x}{x!} \cdot e^{-\lambda}$$

Doing so results in the folowing probabilities: \begin{align} p(x = 0) &= \dfrac{1.33^0}{0!} \cdot e^{-1.33}\\ &=0.264477\\ \\ p(x = 1) &= \dfrac{1.33^1}{1!} \cdot e^{-1.33}\\ &=0.351755\\ \\ p(x = 2) &= \dfrac{1.33^2}{2!} \cdot e^{-1.33}\\ &=0.233917\\ \\ p(x = 3) &= \dfrac{1.33^3}{3!} \cdot e^{-1.33}\\ &=0.103370\\ \\ p(x = 4) &= \dfrac{1.33^4}{4!} \cdot e^{-1.33}\\ &=0.034481\\ \\ p(x = 5) &= \dfrac{1.33^5}{5!} \cdot e^{-1.33}\\ &=0.009172\\ \\ p(x = 6) &= \dfrac{1.33^6}{6!} \cdot e^{-1.33}\\ &=0.002033\\ \\ p(x = 7) &= \dfrac{1.33^7}{7!} \cdot e^{-1.33}\\ &=0.000386\\ \\ p(x = 8) &= \dfrac{1.33^8}{8!} \cdot e^{-1.33}\\ &=0.000064\\ \\ \end{align}

This shows the distributions of probabilities that a randomly chosen team will score $$x$$ goals in a game. This is shown in the graph below:

# 8. Normal

The most widely used model for random variables with continuous distributions is the family of normal distributions. In many cases, data tends to be distributed in a bell shape where the highest probability occurs at the mean/expectation and decreases as you get further from that mean/expected value.

The density function for a normal probability distrbution is: $$f(x) = \dfrac{e^{\left(\dfrac{-(x-\mu)^2}{(2\sigma^2)}\right)}}{\sigma \cdot \sqrt{2\pi}}$$ Where $$\mu$$ is the mean of the data and $$\sigma$$ is the standard deviation.

Areas under the density function corresponding to $$P(a \leq X \leq b)$$, the probability that $$X$$ falls between $$a$$ and $$b$$ are equal to the integral: $$\int_a^b \dfrac{e^{\left(\dfrac{-(x-\mu)^2}{(2\sigma^2)}\right)}}{\sigma \cdot \sqrt{2\pi}}$$

Example: Let's say that you are doing research on a certain species of starfish for an internship. After measuring 100 starfish, you now have a collection of data based on thir lengths (in inches):

3.6, 3.7, 3.8, 3.8, 3.9, 3.9, 4.0, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.1, 4.1, 4.1, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.2, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.5, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.6, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.7, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.8, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 4.9, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.1, 5.1, 5.1, 5.1, 5.2, 5.2, 5.3, 5.3, 5.4, 5.5

Shown below is a table of starfish length values and their frequencies.

Length in inches Frequency
3.6 1
3.7 1
3.8 2
3.9 2
4 4
4.1 6
4.2 7
4.3 8
4.4 9
4.5 10
4.6 10
4.7 9
4.8 8
4.9 7
5 6
5.1 4
5.2 2
5.3 2
5.4 1
5.5 1

You are given the task of finding the probability that the length of a starfish is between 4 and 5 inches. This can be seen as $$P(4 \leq X \leq 5)$$.

For this set of data, $$\mu = 4.55 \\ \sigma = 0.392 \\ a = 4 \\ b = 5$$

Plugging in the values $$\mu$$, $$\sigma$$, $$a$$, and $$b$$ into the density function gives us: \begin{align} P(a \leq X \leq b) &= \dfrac{e^{\left(\dfrac{-(x-\mu)^2}{(2\sigma^2)}\right)}}{\sigma \cdot \sqrt{2\pi}} \\ &= \dfrac{e^{\left(\dfrac{-(x-4.55)^2}{(2 \cdot0.392^2)}\right)}}{0.392 \cdot \sqrt{2\pi}} \\ &= 0.7942 \end{align} This means that the probability that the length of a starfish is between 4 and 5 inches is 79.42%.

# 9. Central Limit Theorem

Abraham De Moivre, an English mathematician, was the first to deduce an equation for the bell shaped curve displayed in our previous section. The curve is smooth and elegant with no violent spikes. In math, we say this curve is continuous, but in statistics we will say this curve is normal. Fast forward to almost a century later and French mathematician Pierre Laplace proves that this convergence is true for all success probabilities (except 0 and 1).

Laplace believed that the Central Limit Theorem (CLT) could be used to explain the uncertainties that fill our lives. The German "prince of mathematicians," Karl Gauss, applied the normal distribution to measurements of the shape of the Earth and the movements of planets. In the 1930s, mathematicians proved that this convergence is true of virtually all probability distributions. This theorem, which is the culmination of 200 years of investigation, is one of the most famous mathematical theorems. I present to you, the Central Limit Theorem.

$$\textbf{Theorem} \$$ The sample mean of a large random sample of random variables with mean $$\mu$$ and finite variance $$\sigma^2$$ has approximately the normal distribution with mean $$\mu$$ and variance $$\sigma^2 /n$$. This result helps to justify the use of the normal distribution as a model for many random variables that can be thought of as being made up of many independent parts.

The CLT basically says that after $$n$$ trials, the graph will look like a normal distribution. Simple, right? Let's say we flip a coin 100 times. According to the Central Limit Theorem, keyword $$limit$$, if we graphed our results, the graph would begin to resemble a normal curve. This is uncanny, but true. So if we flip our penny $$n$$ times, the graph will have a normal distribution.

### CONGRATULATIONS!

Congratulations for getting this far! We hope you enjoyed this project. Please reach out to us here if you have any feedback or would like to publish your own project.

### Scrape A Webpage (Python)

Web Scraping with Python 3