Statistics & Probability Glossary

Quick reference for statistical terms and concepts

B

Bell Curve

Another name for the normal distribution curve, which resembles the shape of a bell when graphed.

Example: Test scores often follow a bell curve, with most students scoring near the average.

Binomial Distribution

A probability distribution for the number of successes in a fixed number of independent trials with the same probability of success.

Example: Flipping a coin 10 times and counting how many times it lands on heads.

Box Plot

A graphical representation of data showing the median, quartiles, and potential outliers in a dataset.

Example: A box plot can show the distribution of salaries in a company, highlighting the median salary and any unusually high or low earners.

C

Central Tendency

A measure that represents the center or typical value of a dataset. Common measures include mean, median, and mode.

Example: The mean income in a city is a measure of central tendency for income data.

Combinations

The number of ways to choose items from a set where the order does not matter. Calculated as nCr = n! / (r!(n-r)!).

Example: Choosing 3 people from a group of 5 to form a committee. Order does not matter.

Conditional Probability

The probability of an event occurring given that another event has already occurred. Written as P(A|B).

Example: The probability of drawing a second ace from a deck, given that the first card drawn was an ace.

Cumulative Frequency

The running total of frequencies in a dataset, showing how many values fall at or below a particular value.

Example: In a test score distribution, cumulative frequency shows how many students scored up to a certain score.

D

Data Distribution

The way data values are spread or arranged across different values or ranges.

Example: Heights in a population typically have a normal distribution, clustering around an average height.

Decimal

A number expressed in base-10 notation, using a decimal point to separate whole numbers from fractional parts.

Example: 3.14159 is a decimal representation of pi.

F

Frequency

The number of times a value or range of values occurs in a dataset.

Example: If 5 students scored between 80-90 on a test, the frequency for that score range is 5.

H

Histogram

A bar chart that displays the frequency distribution of continuous data by grouping values into bins or intervals.

Example: A histogram of ages in a survey might show bars for age ranges 0-10, 10-20, 20-30, etc.

I

Interquartile Range (IQR)

The range between the first quartile (Q1) and third quartile (Q3), representing the middle 50% of data. Calculated as IQR = Q3 - Q1.

Example: If Q1 = 25 and Q3 = 75, then IQR = 50, meaning the middle half of data spans 50 units.

M

Mean

The arithmetic average of a dataset, calculated by summing all values and dividing by the count of values.

Example: The mean of 2, 4, 6, 8 is (2+4+6+8)/4 = 5.

Median

The middle value in a sorted dataset. For an even number of values, it is the average of the two middle values.

Example: The median of 1, 3, 5, 7, 9 is 5. The median of 1, 3, 5, 7 is (3+5)/2 = 4.

Mode

The most frequently occurring value in a dataset. A dataset can have multiple modes or no mode.

Example: In the dataset 1, 2, 2, 3, 4, 4, 4, the mode is 4.

N

Normal Distribution

A symmetric probability distribution where data clusters around a central mean value, forming a bell-shaped curve.

Example: Human heights follow a normal distribution, with most people near average height and fewer at extreme heights.

O

Ogive

A cumulative frequency graph that shows the running total of frequencies across class intervals.

Example: An ogive can show how many students scored below each grade level on a test.

Outlier

A data point that is significantly different from other observations in a dataset, lying far from the central cluster.

Example: In a dataset of salaries ranging from $30k-$80k, a salary of $500k would be an outlier.

P

Percentile

A value below which a given percentage of observations fall. For example, the 90th percentile is the value below which 90% of data lies.

Example: A test score at the 85th percentile means you scored better than 85% of test-takers.

Permutations

The number of ways to arrange items from a set where the order matters. Calculated as nPr = n! / (n-r)!.

Example: Arranging 3 books from a shelf of 5 books in a specific order. Order matters.

Probability

A measure of how likely an event is to occur, expressed as a number between 0 (impossible) and 1 (certain).

Example: The probability of rolling a 6 on a fair die is 1/6 ≈ 0.167.

Q

Quartiles

Values that divide a sorted dataset into four equal parts. Q1 (25th percentile), Q2 (median, 50th percentile), and Q3 (75th percentile).

Example: In a dataset of 100 test scores, Q1 is the score at position 25, Q2 at position 50, and Q3 at position 75.

R

Range

The difference between the maximum and minimum values in a dataset. Range = Max - Min.

Example: If the highest test score is 95 and the lowest is 42, the range is 95 - 42 = 53.

Relative Frequency

The proportion or percentage of times a value occurs relative to the total number of observations.

Example: If 20 out of 100 students scored A, the relative frequency is 20/100 = 0.2 or 20%.

Rounding

The process of approximating a number to a specified level of precision, typically to a certain number of decimal places or significant figures.

Example: Rounding 3.14159 to 2 decimal places gives 3.14.

S

Sample

A subset of a population used to make inferences about the entire population.

Example: Surveying 1,000 voters out of millions to predict election results.

Standard Deviation

A measure of the amount of variation or dispersion in a dataset. It indicates how spread out values are from the mean.

Example: A standard deviation of 5 in test scores means most scores are within 5 points of the average.

Statistic

A numerical measure calculated from sample data, used to estimate population parameters.

Example: The average height calculated from a sample of 100 people is a statistic.

V

Variable

A characteristic or attribute that can take on different values in a dataset.

Example: Height, weight, and age are all variables in a health study.

Variance

A measure of variability in a dataset, calculated as the average of squared deviations from the mean.

Example: A variance of 25 means the average squared distance from the mean is 25.

Z

Z-Score

A standardized score indicating how many standard deviations a value is from the mean. Z = (X - μ) / σ.

Example: A z-score of 2.0 means the value is 2 standard deviations above the mean.