The correlation measure that students typically first encounter is actually Pearson's product-moment correlation coefficient. This coefficient is simply a standardized version of the covariance between two random variables (say, X and Y):
© 2015, David E. Giles
ρXY = cov.(X,Y) / [s.d.(X) s.d.(Y)] , (1)
where "s.d." denotes "standard deviation".
In the case of sample data, this formula will be:
ρXY = Σ[(Xi - X*)(Yi - Y*)] / {[Σ(Xi - X*)2][Σ(Yi - Y*)2]}1/2 , (2)
where the summations run from 1 to n (the sample size); and X* and Y* are the sample averages of the X and Y variables.
Scaling the covariance in this way to create the correlation coefficient ensures that (i) the latter is unitless; and (ii) it takes values in the interval [-1, +1]. The first of these two properties facilitates meaningful comparisons of correlations involving data measured in different units. The second property provides a metric that enables us to think about the "degree" of correlation in a meaningful way. (In contrast, a covariance can take any real value - there are no upper or lower bounds.)
Result (i) above is obvious. Result (ii) can be established in a variety of ways.
(a) If you're familiar with the Cauchy-Schwarz inequality, the result that -1 ≤ ρ ≤ 1 is immediate.
(b) If you like working with vectors, then it's easy to show that ρ is the cosine of the angle between two vectors in the X-Y plane. As cos(θ) is bounded below by -1 and above by +1 for any θ, we have our result for the range of ρ right away. See this post by Pat Ballew for access to the proof.
(c) However, what about a proof that requires even less background knowledge? Suppose that you're a student who knows how to solve for the roots of a quadratic equation, and who knows a couple of basic results relating to variances. Then, proving that -1 ≤ ρ ≤ 1 is still straightforward:
Let Z = X + tY, for any scalar, t. Note that var.(Z) = t2var.(Y) +2tcov.(X,Y) + var.(X) ≥ 0.
Or, using obvious notation, at2 + bt + c ≥ 0
This implies that the quadratic must have either one real root or no real roots, and this in turn implies that b2 - 4ac ≤ 0.
Recalling that a = var.(Y); b = 2cov.(X,Y); and c = var.(X), some simple re-arrangement of the last inequality yields the result that -1 ≤ ρ ≤ 1.
A complete version of this proof is provided by David Darmon, here.
Result (i) above is obvious. Result (ii) can be established in a variety of ways.
(a) If you're familiar with the Cauchy-Schwarz inequality, the result that -1 ≤ ρ ≤ 1 is immediate.
(b) If you like working with vectors, then it's easy to show that ρ is the cosine of the angle between two vectors in the X-Y plane. As cos(θ) is bounded below by -1 and above by +1 for any θ, we have our result for the range of ρ right away. See this post by Pat Ballew for access to the proof.
(c) However, what about a proof that requires even less background knowledge? Suppose that you're a student who knows how to solve for the roots of a quadratic equation, and who knows a couple of basic results relating to variances. Then, proving that -1 ≤ ρ ≤ 1 is still straightforward:
Let Z = X + tY, for any scalar, t. Note that var.(Z) = t2var.(Y) +2tcov.(X,Y) + var.(X) ≥ 0.
Or, using obvious notation, at2 + bt + c ≥ 0
This implies that the quadratic must have either one real root or no real roots, and this in turn implies that b2 - 4ac ≤ 0.
Recalling that a = var.(Y); b = 2cov.(X,Y); and c = var.(X), some simple re-arrangement of the last inequality yields the result that -1 ≤ ρ ≤ 1.
A complete version of this proof is provided by David Darmon, here.