Statistics & Probability Glossary
Quick reference for statistical terms and concepts
B
Bell Curve
Another name for the normal distribution curve, which resembles the shape of a bell when graphed.
Binomial Distribution
A probability distribution for the number of successes in a fixed number of independent trials with the same probability of success.
Box Plot
A graphical representation of data showing the median, quartiles, and potential outliers in a dataset.
C
Central Tendency
A measure that represents the center or typical value of a dataset. Common measures include mean, median, and mode.
Combinations
The number of ways to choose items from a set where the order does not matter. Calculated as nCr = n! / (r!(n-r)!).
Conditional Probability
The probability of an event occurring given that another event has already occurred. Written as P(A|B).
Cumulative Frequency
The running total of frequencies in a dataset, showing how many values fall at or below a particular value.
D
Data Distribution
The way data values are spread or arranged across different values or ranges.
Decimal
A number expressed in base-10 notation, using a decimal point to separate whole numbers from fractional parts.
F
Frequency
The number of times a value or range of values occurs in a dataset.
H
Histogram
A bar chart that displays the frequency distribution of continuous data by grouping values into bins or intervals.
I
Interquartile Range (IQR)
The range between the first quartile (Q1) and third quartile (Q3), representing the middle 50% of data. Calculated as IQR = Q3 - Q1.
M
Mean
The arithmetic average of a dataset, calculated by summing all values and dividing by the count of values.
Median
The middle value in a sorted dataset. For an even number of values, it is the average of the two middle values.
Mode
The most frequently occurring value in a dataset. A dataset can have multiple modes or no mode.
N
Normal Distribution
A symmetric probability distribution where data clusters around a central mean value, forming a bell-shaped curve.
O
Ogive
A cumulative frequency graph that shows the running total of frequencies across class intervals.
Outlier
A data point that is significantly different from other observations in a dataset, lying far from the central cluster.
P
Percentile
A value below which a given percentage of observations fall. For example, the 90th percentile is the value below which 90% of data lies.
Permutations
The number of ways to arrange items from a set where the order matters. Calculated as nPr = n! / (n-r)!.
Probability
A measure of how likely an event is to occur, expressed as a number between 0 (impossible) and 1 (certain).
Q
Quartiles
Values that divide a sorted dataset into four equal parts. Q1 (25th percentile), Q2 (median, 50th percentile), and Q3 (75th percentile).
R
Range
The difference between the maximum and minimum values in a dataset. Range = Max - Min.
Relative Frequency
The proportion or percentage of times a value occurs relative to the total number of observations.
Rounding
The process of approximating a number to a specified level of precision, typically to a certain number of decimal places or significant figures.
S
Sample
A subset of a population used to make inferences about the entire population.
Standard Deviation
A measure of the amount of variation or dispersion in a dataset. It indicates how spread out values are from the mean.
Statistic
A numerical measure calculated from sample data, used to estimate population parameters.
V
Variable
A characteristic or attribute that can take on different values in a dataset.
Variance
A measure of variability in a dataset, calculated as the average of squared deviations from the mean.
Z
Z-Score
A standardized score indicating how many standard deviations a value is from the mean. Z = (X - μ) / σ.