Glossary

 

[ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z # ]


- A -

Algorithm
Annealing
ANOVA: ``Analysis of Variance.'' A Statistical Test for heterogeneity of Means by analysis of group Variances. To apply the test, assume random sampling of a variate y with equal Variances, independent errors, and a Normal Distribution. Let n be the number of Replicates (sets of identical observations) within each of k Factor Levels (treatment groups), and be the jth observation within Factor Level i. Also assume that the ANOVA is ``balanced'' by restricting n to be the same for each Factor Level.

Back to Top

- B -

Bayes' Theorem
Bernoulli Trials
Bias The difference between the mean of an estimator and the quantity being
estimated. Zero bias is desirable: it means that on average you get the right answer!
Binomial Coefficient - "n-choose-r" is the number of ways of selecting r items from n
without replacement and without regard to the order in which they are selected.
Binomial Distribution -
Bivariate Normal Distribution - Bivariate Normal Distribution explained
Blocking: A technique in experimental design that adjusts for pre-existing patterns
of variation between experimental units.
Block Design -
Block Design Partial -
Balanced Incomplete Block Design (BIBD) - see Block Design
Back to Top

- C -

Central Limit Theorem: a theorem in mathematical statistics: sums of
independent random variables tend to follow a normal distribution regardless of their
individual probability distributions.
Central tendency: Characteristics of a probability distribution or of a set of
data that locate its center. The principal measures are the mean, median and mode.
Chi-Square Distribution: one of a family of continuous positive-valued probability distributions.
See also degrees of freedom.
Cluster Sampling
Coefficient of Determination: A measure of the proportion of variability
in the response variable explained by a linear regression model. It is a number between
zero and one. A value close to zero suggests a poor model. Also called the coefficient of
multiple correlation, or R2.
Conditional probability: the probability of an event given that another
event occurs.
Confidence interval: a random interval that has a known probability (the
"confidence coefficient" or "confidence level") of including the true value of a
parameter. Defines an interval within which the true population parameter is likely to lie.
It can be thought of as a measure of the precision of a sample statistic.
Continuous data variable: A variable that can take any value within its
possible range, limited only by the accuracy of the measurements (e.g. blood pressure
measurements).
Continuous random variable: A variable that can take any value within
its range, so that the possible values are infinitely many and cannot be listed or counted.
Consecutive Numbers -
Combination  -  Information of Combinations of Sets -
Combinatorial Designs -
Conditional Probability -
Continuous Random Variable -
Correlation: a measure of the degree to which two variables or data variates are
linearly related. Correlation ranges from -1 to 1, with 0 meaning no linear relationship.
Often measured by Pearson's product-moment correlation coefficient, which is suitable
for testing for zero correlation in normally-distributed data. Spearman's rank correlation
and Kendall's tau are nonparametric alternatives.
Cover  & Coverings    Minimal Cover 
Covering Design C(v,k,t,m,l,=b) is a pair (V,B), where V is a set of v elements (called points) and B is a collection of b k-subsets of V (called blocks), such that every m-subset of V intersects at least l members of B in at least t points. It is required that v >= k >= t and m >= t. The case m > k is also a valid case. B can be a multiset.
Covering Terminology - Related to Lottery
Example 1:

C(22, 6 ,  3,   3,  1 ,  b=77)
C(V , K ,  T , M , L , B )

Example 2:

C(49, 6 ,  3,   6,   1 , p=51%)
C(V , K ,  T , M ,  L , p % )

-v   total numbers in the design    (=22)
-k   numbers drawn      (=6)
-t    min match t   (t of m)   (=3)
-m  min match of m (t of m)   (=3)
-l    lambda (min l times t of m)   (=1)
-b   number of blocks   (=77)
-p   percentage of blocks covered   (=100%)
-v   total numbers in the design    (=49)
-k   numbers drawn      (=6)
-t    min match t   (t of m)   (=3)
-m  min match of m (t of m)   (=6)
-l    lambda (min l times t of m)   (=1)
-b   number of blocks   (=??) 
-p   percentage of blocks covered   (=51%)

 

Covering System
Critical value: a predetermined cutoff value for a test statistic, for deciding
whether or not to reject the null hypothesis in hypothesis testing.
Cumulative Distribution Function (CDF): For a random variable X, the cdf F(x) is
the probability that X is less than or equal to x: that is, FX(x) = Pr(X <= x).
Cycle  -
Cyclical Analysis -
 
Back to Top

- D -

Data: observed values of variables.
Decades:
Decision rule: a procedure that determines whether or not a hypothesis test is
significant.
Degrees of freedom (df): A parameter which indexes the families of
t-distributions and chi-squared distributions. A t-distribution with many df is similar to the
standard normal distribution, while one with few df has greater variance.
Density: probability density function.
Dependent variable:
Discrete random variable: A random variable that takes only a finite
number of different values.
Dispersion:
Distribution: see probability distribution.
Discrete Distribution -
Discrete Random Variable -
Distribution Functions
Draw,  Drawing  -  Webster's definition
Dynamic Programming -
Back to Top

- E -

Error-Correcting Code
Event: a set containing zero or more of the possible outcomes of a trial.
Exact Covering System
Expected value, Expectation: the mean of a random variable; the centre
of mass of a probability density function or probability mass function: the integral x
f(x) dx for a continuous distribution or the sum x p(x) for a discrete distribution.
Expected frequency: The expected frequency for a cell in a two-way
contingency table is the frequency that will appear on average in that cell, if the null
hypothesis of no association is true.
Experimental design: an allocation of treatments to experimental units; a
procedure for doing this; the theory of how to do this.
Experiment: a study in which the allocation of treatments to units is under the
control of the investigator; not an observational study.
Extraordinary variation:
Extrapolation: Prediction of a response variable outside the range of the
available data. In time series analysis, prediction of future outcomes.
Back to Top

- F -

Fallacies
F-distribution: A family of probability distributions used for hypothesis tests in
the analysis of variance. A particular distribution from the family is characterized by its
numerator and denominator degrees of freedom.
F-test: The test used in analysis of variance. The test statistic is the ratio of the
variance between groups to the variance within groups. If there is no difference
between the groups (i.e. the null hypothesis is true), this statistic follows an F
distribution.
Fisher's exact test: a test of association between the rows and columns of a
contingency table. This test is computationally intensive except in very small tables. For
larger problems, the Pearson chi-squared test is an easy and excellent approximation.
Frequency -
Frequency position - indicates the position of a number on the Frequency Table
Frequency table -
Back to Top

- G -

Gamma Distribution
Game  -  Gaming - Hypertext Webster Gateway at UCSD
Game Theory
Gap -
Geometric Distribution -
Global optimum -
Back to Top

- H -

Hadamard Matrix: An n x n matrix with entries either 1 or -1 where each row is orthogonal to every other row. Of course, a row's inner product with itself is n. So when you multiply a Hadamard matrix times it's transpose you get n times the identity matrix.
Hamming Distance: In a binary code a distance can be defined between code words, given two words v and w the Hamming distance, d(v,w) is the number of positions in which they differ.
For instance:
0100110110111 and
0010110110111
are at Hamming distance 2 from one another.
Hypothesis
Hypothesis testing -
 
Back to Top

- I -

Inclusion/Exclusion Principle:A technique for counting the elements of a set that don't have certain properties. Let S be a set of n-elements, and let P1 and P2 be properties that the elements of S may or may not have. Let A1 be the subset of S consisting of those elements having P1 and A2 be those having P2 . The number of elements of S having neither P1 nor P2 is given by:
|S|-|A1|-|A2|+|A1 intersect A2|
The last term has to be added because the elements having both properties have been subtracted twice in the previous terms.
This is the general idea of the inclusion/exclusion principle, the formula becomes slightly more complicated but can readily be generalized to n properties.
Independent Events -
Independent Random Variables -
Independence: Two events are independent if the probability of either is the
same whether or not the other occurs.
Independent random variables: Two random variables are
independent if their joint probability function is the product of their individual probability
functions.
Independent samples: samples in which the observations in one sample are
not related in any way to the observations in the other sample. The samples may be of
unequal size. See also paired data.
Independent variables: variables that are controlled or considered fixed,
that may affect the values taken by dependent variables. Often called "X-variables",
"predictor variables" or "explanatory variables".
Inference: Statistical inference.
Interaction: the condition that the strength of association between two variables
depends on the value of a third, or that the effect of each of two explanatory variables
on a response variable depends on the level of the other explanatory variable. See also
effect modifier.
Intercept: the constant in a regression equation; the point where a regression line
intercepts the vertical axis, if the horizontal axis has a true zero origin.
Interval scale: a scale of measurement such that the difference between values
is important but the absolute numbers are not; for example, log-transformed data is
interval-scaled because the base of the logs is not important.
Interquartile Range (IQR): the difference Q3-Q1 between the first and
third quartiles of a dataset or distribution.
Intersection: the intersection of two events is the set of outcomes contained in both.
Interval estimate: a range of plausible values for an unknown parameter.
Back to Top

- J -

Joint probability: the probability of the intersection of two events.
Joint probability distribution: the probability distribution of all combinations of
two or more random variables.
Back to Top

- K -

Kurtosis: a measure of the "peakedness" of a distribution. A normal distribution
has kurtosis 3, while a uniform distribution has kurtosis of only 1.8. The coefficient of
kurtosis is the fourth moment about the mean divided by the variance squared
(sometimes 3 is then subtracted so that a normal distribution has kurtosis zero).
Back to Top

- L -

Latin Square: A Latin Square of order n is an nxn array made from the integers 1 to n with the property that any integer occurs once in each row and column. A pair of Latin squares are called orthogonal if the n-squared pairs formed by juxtaposing the two arrays are all distinct.
Lattice: A subset of real or complex (or quaternionic) n-space that consists of all finite sums of a set of n independent generating vectors with coefficients in the corresponding ring of integers.
Lotto -
Lotto Cycle -
Lottery -
Law of Effect:  -  (Thorndike, 1898): choices that have led to good outcomes in the past are more likely to be repeated in the future.
Least-squares: a method of estimating unknown parameters by minimizing the
sum of squared residuals. The usual method of fitting a linear regression model. See also
maximum likelihood; if the data are normally distributed these are the same.
Longitudinal: a study design in which subjects are monitored over a period of
time, or are observed on several occasions over a period.
Level of significance: significance level.
Likelihood: the probability of the observed data under a particular statistical
model. Unknown model parameters can be estimated by choosing values that maximize
the likelihood: these are the maximum likelihood estimates.
Likelihood ratio test: a test based on comparing the likelihood maximized
under one statistical model to the likelihood under another model that excludes one or
more parameters. The likelihood ratio test is an alternative to the Wald test.
Linear regression: a regression model in which the response variable (Y) is
linearly related to each explanatory variable. Simple linear regression is the case where
there is only a single explanatory variable (X).
Linear relationship: a relationship between two variables that can be
described by a straight line.
Linear model: a relationship between two or more variables, with an equation in
which the unknown parameters appear only as factors multiplying additive terms. The
linear regression model is a linear model.
Log rank test: A test for comparing the survival time of two or more groups.
Logistic regression: a regression model for binary (dichotomous) outcomes.
The data are assumed to follow binomial distributions with probabilities that depend on
the independent variables.
Back to Top

- M -

Magic Square: A magic square of order n is an n x n array containing the integers 1, 2, 3 ... n2. The sum of every row, column, and the two principal diagonals is the same number s, called the magic sum. The magic sum can be deduced; the sum of all entries in the magic square is counted in two ways, first straightforwardly -- 1+2+3+...+n2 = n2(n2+1)/2 then by adding up the rows (or columns) there are n rows (or columns) each of which sums to s, thus the sum of all entries in the magic square is also ns.
Equating these quantities, and solving for s yields:
s = n(n2+1)/2
Here is a magic square of order 5:
15   8   1 24 17
16 14   7   5 23
22 20 13   6   4
  3 21 19 12 10
  9   2 25 18 11
The magic sum is 65.
Mean -
Median -
Mode -
Mutually Exclusive Events
Mantel-Haenzel method: A method of adjusting for confounding bias by
combining information from stratified two-by-two tables.
Maximum likelihood: a method for fitting a model to data, typically used
when the data are not normally distributed: e.g. in logistic regression.
McNemar's test: a form of the chi-squared test for matched data.
Mean: the arithmetic mean of a set of numbers; the expected value of a random
variable.
Mean square: a sum of squares divided by its degrees of freedom.
Measure of central tendency: see central tendency.
Median: a measure of central tendency. The middle value in a set of n ordered
numbers, or the average of the two middle values if n is even; the value that divides a
probability distribution into two parts that each contain half the probability. The median is
said to be robust, i.e. less sensitive to outliers than the mean.
Median survival time: the time t at which the survival probability S(t) has fallen to 0.5.
Mode: a measure of central tendency. The most frequent value in a set of
observations on a variable; the most probable outcome of a discrete random variable;
the value of a continuous random variable that has the highest probability density.
Model: a mathematical relationship between variables, assumed to describe the
population from which the data were sampled or the process by which the data were
generated. Statistical models also describe the probability distribution of any random
variables in the model.
Moving averages:
Multiple regression: A regression model in which there is more than one
explanatory variable.
Multiplication rule:
Mutually exclusive events: events which cannot both occur.
 
Back to Top

- N -

Normal Distribution -
The Normal Distribution
Nominal scale: a scale which consists of categories with no particular ordering
(e.g. race). See also ordinal scale.
Non-linear regression scale:
Nonparametric methods: A group of statistical techniques that don't make
strong assumptions about the distribution of the outcome variable. All tests involving
ranked data are nonparametric. See also parametric.
Normal approximation: the use of the normal distribution to approximate
another probability distribution, usually for ease of calculation.
Normal distribution: A family of probability distributions used to describe
continuous variables and often used for modeling continuous data. These distributions
have several mathematical properties that make them convenient to use. Also called the
Gaussian distribution.
Null hypothesis: the hypothesis that will be considered disproved by a
significant test result. The hypothesis of no effect, no difference, no relationship etc. See
also alternative hypothesis. A scientific theory should be challenged by conducting tests
in which the theory is represented by a null hypothesis. If it survives such tests, its
scientific status is strengthened.
Back to Top

- O -

Odd/Even -
Odd Number
Odd Prime
Odds  - The ratio of the probability (p) that an event occurs to the probability (1-p) that it does not: odds=p/(1-p). While probabilities are values from 0 to 1, odds can be any non-negative number.
Odds ratio - the ratio between the odds that an event occurs in two groups or two sets of circumstances.
Optimization
Optimization Theory
Outcome - a single element in the sample space of a random trial.
Outlier - a surprising data value that may represent an error.
Back to Top

- P -

Packing -
Pascal's Triangle
Partition , Partitioning -
Permutation  -  Permutations - Definition  A rearrangement of the elements of a set.
Poisson Distribution -  a discrete probability distribution used to describe the occurrence of rare events.
Predictability
Prime
Probability  - a probability provides a quantitative description of the likely occurrence of a particular event. Probability is expressed on a scale between 0 and 1; a rare event has a probability close to 0, a very common event has a probability close to 1. The probability of an event has been defined as its long-run relative frequency.
Probability Distribution -  the relative frequencies with which a random variable takes each of its possible values, or takes values in any specified numeric range.
Probability Density Function
Conditional Probability -
Polyominos -
Back to Top

- Q -

Qualitative variable: a categorical variable. See also quantitative variable.
Quality control: a statistical theory and practice aimed at improving the quality
and reliability of industrial production and other processes.
Quantitative variable: a variable whose values are numbers with real
numeric meaning. See also qualitative variable.
Quantile:
Quartile: the 25th, 50th or 75th percentile of a distribution or of a set of data. If the
values of a variable are placed in ascending (or descending) order, the quartiles divide
the ordered values into fourths. The second quartile is the median.
Query
Back to Top

- R -

Random Number
Random Variable -
Range -
Redundancy
Reliability -  a measure of how consistent repeated measurements are when performed under comparable conditions.
Relative Frequency -
Repeat -
Response bias
Robust - a statistic or statistical method that is relatively insensitive to unusual or erroneous data points.
Back to Top

- S -

Sample: A selected subset of a population. See also random sample.
Sample space: the set of all possible outcomes of a trial.
Sample statistic: An estimate of a population parameter obtained from a sample. The value will vary from sample to sample according to the sampling distribution of the statistic.
Sample Variance
Sampling
Seed
Set
Set Theory
Small/big -
Statistical inference: The process of drawing conclusions about a population based on data from a sample.
Statistics
Steiner System - Steiner System S(v,k,t,=b) is also a t-design with usually l=1.
Steiner Quadruple System
Steiner Triple System A Steiner Triple System of order v is a collection of triples or 3-subsets of a set X of size v such that each pair of elements of X occurs in exactly one triple. In other words a Steiner Triple System is a 2-design with parameters (v, 3, 1, (v-1)/2, v(v-1)/6) .Since the design parameters must be integers, it is necessary that v = 6n + 1 or v = 6n + 3 . Kirkman showed that this is a sufficient condition that a Steiner Triple System exist.
Symmetric Design: A design in which the parameters (v, k, lambda, r, b) are such that v = b and k = r .
Back to Top

- T -

t-design t-(v,k,l,=b) is an exact Covering with m = t and the least possible number of blocks. t-Designs do and can exist for some admissible parameters only, where also some numerical necessary conditions are met.
t Distribution
Time Series  -   a (usually long) sequence of observations made on a variate. Each observation may depend on (be correlated with) one or more preceding observations.
Time series of a Lottery -
Treatment: an experimental procedure applied to groups of subjects or other observational units; the factor of interest in an analysis of variance.
Trend: a long-term linear (i.e. constant rate) change in the level of a variable in a time series .
Trivial
Type I error: A type I error occurs if, based on the sample data, we decide to reject the null hypothesis when in fact (i.e., in the population ) the null hypothesis is true.
Type II error: A type II error occurs if, based on the sample data, we decide not to reject the null hypothesis when in fact (i.e., in the population) the null hypothesis is false.
Turan's Theorem: Let t and n be positive integers with t at least equal to 2, and n greater than or equal to t. The maximum number of edges of a graph of order n that doesn't contain a complete subgraph of order t is
where the ni form a partition of n into t-1 parts which are as equal as possible.
Furthermore, the complete (t-1)-partite graph with parts of size n1,n2,...nt-1 is the only graph whose number of edges is equal to
the bound above but still doesn't contain a complete subgraph of order t.

 
Back to Top

- U -

Uniform Distribution -
Union
Back to Top

- V -

Variance - Variance 2 -
Variance of Sample -
Venn Diagrams
Back to Top

- W -

Wheel -
Perfect Wheel -            ( See list of perfect wheels)
Back to Top

- X -

X-bar chart: a quality control chart for the mean of a process.
X variable: see independent variable
Back to Top

- Y -

(empty)
Back to Top

- Z -

Z-score: the number of standard deviations from the mean. For a value from a normal distribution, the z-score is found by dividing by subtracting the mean of the distribution and dividing by the standard deviation. Most commonly used for test statistics, since the z-score can be referred to tables of the standard normal distribution to determine the p-value.
Back to Top

- # -

3 : 3 is the only Integer which is the sum of the preceding Positive Integers (1+2=3) and the only number which is the sum of the Factorials of the preceding Positive Integers (1!+2!=3). It is also the first Odd Prime. A quantity taken to the Power 3 is said to be Cubed.
 
Back to Top

Revised: August 20, 1998.
Copyright © 1997-1998  by   The Lottery Institute ™
All trademarks or product names mentioned herein are the property of their respective owners.