STATISTICS

BIG DATA & ANALYTICS

The probability of an event, is the chance of that event happening. To calculate the probability, we simply divide divide the number of ways an event can happen by the total number of possible outcomes.

```
P(event) = number of ways event can happen / total number of possible outcomes
// Example - flipping a coin
P(heads) = 1 way to get heads / 2 possible outcomes
= 1 / 2
= 50%
// From this example we can say that there is a 50% chance of the coin landing on heads in a coin flip
```

Probability is always between 0% (impossible) and 100% (certainly will happen).

If the probability of the second event is affected by the outcome of the first event, these events are dependant.

If the probability of the of the second event isn't affected by the outcome of the first event, these events are independent.

A discrete distribution is a distribution that has a countable number of possible values. For example rolling a dice. You can roll a dice 1, 2, or 10 times, but never 1.5 or -3 times.

Another example of a discrete distribution is the distribution of the number of accidents that occur at an intersection over a given period of time. The number of accidents that occur at an intersection is a discrete variable, as it can only take on a specific set of values (e.g. 0, 1, 2, 3, etc).

The binomial distribution is a discrete probability distribution that models the probability of a given number of successes in a fixed number of independent trials, where each trial has two possible outcomes: success or failure. For example flipping a coin where the result will either be heads or tails. Emphasis on independent trials, as one result shouldn't affect another.

The binomial distribution is defined by two parameters: the probability of success in each trial (p) and the number of trials (n).

The Poisson distribution is a discrete probability distribution that models the number of times an event occurs over a certain period of time. The Poisson distribution is often used to model events that happen randomly and independently, such as the number of customers arriving at a store or the number of defects in a manufactured item.

Continuous distributions have an infinite number of possible values. An example of a continuous distribution is the distribution of heights of adult humans. Heights of adult humans are continuous variables, as a person's height can take on any value within a certain range (e.g. from 100 to 200 cm). Another example could the the time waiting for a bus from when it arrives to when it leaves.

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is defined by its mean and standard deviation. It is a very important distribution in statistics and is often used to model real-valued data.

The normal distribution is often used to model data that is continuous and approximately symmetric. It is often used to model physical measurements, such as heights, weights, and IQ scores, as well as financial data, such as stock prices.

The exponential distribution is a continuous probability distribution that models the time between events in a Poisson process, which is a process in which events occur randomly and independently at a constant average rate.

The exponential distribution is often used to model the time between events in systems that exhibit randomness, such as the time between arrivals at a customer service desk or the time between failures of a machine.

The t-distribution is a continuous probability distribution that is used to estimate population parameters when the sample size is small and the population standard deviation is unknown. It is similar to the normal distribution, but it has heavier tails, which means that it is more prone to outliers.

The t-distribution is often used in hypothesis testing, particularly in tests of the mean of a normally distributed population when the standard deviation is unknown. It is also used in estimating the mean of a normally distributed population when the sample size is small.

The log-normal distribution is a continuous probability distribution that is defined over the positive real numbers. It is used to model data that is multiplicatively transformed from a normally distributed population.

The log-normal distribution is often used to model data that is skewed to the right, such as data on income or wealth. It is also used to model data on physical measurements, such as particle sizes or lifetimes of mechanical components.

© 2023 Potado. All rights reserved.