Last week, in a post titled, It's a Wonderful Life, I discussed the OECD's recently released "Better Life Index" (BLI). The index is made up of eleven different components that are then aggregated, with equal weights, into a single BLI, which you can see here. In addition, the OECD provides an interactive tool so you can select your own weights across the 11 measures, and then make cross-country comparisons.
At the end of that earlier post I noted that one question that I hadn't addressed was: "What's really driving the variability in the country rankings?" I promised to return to this question - so here we are.
I've downloaded the OECD data into an Excel workbook that's available on the Data page that goes with this blog. The second worksheet in that book has some new data, as discussed below.
I thought it would be interesting to look at a decomposition of the correlation structure of the data for the eleven measures that underlie of the BLI. Specifically, what I'm going to do is perform a Principal Components analysis of the data This will reveal the structure that I'm interested in, and it will also show us the extent to which the information in the 11 measures can be compressed into a smaller number of components. Finally, I can then use the results to construct a set of weights for the 11 measures that in a sense is "optimal".
I thought it would be interesting to see how this modified BLI compared with the naïve, equal-weights version. The EViews workfile for this can be downloaded from the Code page.
First, here are the Scree Plot and the cumulative eigenvalue plot:
We see that the the first three eigenvalues dominate the correlation matrix, and that the first four principal components account for about 85% of the variation in the data. Indeed, the first principal component by itself "explains" nearly 50% of the variation.
In the following Biplot for the first two principal components, observations that are outliers from zero (at the 10% level, in terms of Mahalanobis distance) have been numbered. What's then interesting is to see is that the first principal component has positive loadings for all eleven measures. I'm going to return to the values of these loadings to construct a weighted-average BLI.
The second principal component has positive loadings for Education, Safety, Work-Life Balance, and Environment. The loadings are negative for the other 7 measures, though those for Housing, Community, and Income are numerically negligible.
At the end of that earlier post I noted that one question that I hadn't addressed was: "What's really driving the variability in the country rankings?" I promised to return to this question - so here we are.
I've downloaded the OECD data into an Excel workbook that's available on the Data page that goes with this blog. The second worksheet in that book has some new data, as discussed below.
I thought it would be interesting to look at a decomposition of the correlation structure of the data for the eleven measures that underlie of the BLI. Specifically, what I'm going to do is perform a Principal Components analysis of the data This will reveal the structure that I'm interested in, and it will also show us the extent to which the information in the 11 measures can be compressed into a smaller number of components. Finally, I can then use the results to construct a set of weights for the 11 measures that in a sense is "optimal".
I thought it would be interesting to see how this modified BLI compared with the naïve, equal-weights version. The EViews workfile for this can be downloaded from the Code page.
First, here are the Scree Plot and the cumulative eigenvalue plot:
We see that the the first three eigenvalues dominate the correlation matrix, and that the first four principal components account for about 85% of the variation in the data. Indeed, the first principal component by itself "explains" nearly 50% of the variation.
In the following Biplot for the first two principal components, observations that are outliers from zero (at the 10% level, in terms of Mahalanobis distance) have been numbered. What's then interesting is to see is that the first principal component has positive loadings for all eleven measures. I'm going to return to the values of these loadings to construct a weighted-average BLI.
The second principal component has positive loadings for Education, Safety, Work-Life Balance, and Environment. The loadings are negative for the other 7 measures, though those for Housing, Community, and Income are numerically negligible.
The Biplot also tells us that the loadings for the variables in the first PC are actually fairly similar. See how the red dots all lie at horizontal distances of between about 3 and 5 on the horizontal axis. Indeed, here are the values of the loadings for the first two principal components:
Eigenvectors (loadings): | ||
Variable
|
PC 1
|
PC 2
|
HOUSING
|
0.398846
|
-0.006881
|
INCOME
|
0.276797
|
-0.058020
|
JOBS
|
0.274583
|
-0.351368
|
COMMUNITY
|
0.370817
|
-0.034115
|
EDUCATION
|
0.242666
|
0.416222
|
ENVIRONMENT
|
0.251778
|
0.400908
|
GOVERNANCE
|
0.208597
|
-0.209589
|
HEALTH
|
0.355808
|
-0.266248
|
LIFE_SATISFACTION
|
0.342246
|
-0.378623
|
SAFETY
|
0.247171
|
0.428836
|
WORK_LIFE_BALANCE
|
0.284232
|
0.309715
|
If we now use the loadings for the first PC as weights, we can construct a weighted-average Better Life Index that reflects the relative contributions of the different measures to the overall correlation matrix for the data. Not surprisingly, the results are actually very close to what we get with the equal-weights index that is the starting point for the OECD. Here are the two sets of rankings of the 34 OECD-member countries (1st = best):
Country
|
Equal Wts.
|
PC1 Wts.
|
Rank
|
Rank
| |
Australia
|
1
|
2
|
14=
|
15
| |
17
|
16
| |
2
|
1
| |
32
|
32
| |
23
|
23
| |
6
|
6
| |
31
|
31
| |
9
|
12
| |
18
|
18
| |
16
|
17
| |
27
|
25
| |
29
|
30
| |
12
|
11
| |
14=
|
14
| |
20
|
19
| |
24
|
24
| |
19
|
21
| |
26
|
27
| |
11
|
9=
| |
33
|
33
| |
10
|
8
| |
4
|
4
| |
5
|
5
| |
25
|
26
| |
30
|
29
| |
28
|
28
| |
21
|
22
| |
22
|
20
| |
3
|
3
| |
8
|
7
| |
34
|
34
| |
13
|
13
| |
7
|
9=
|
Not surprisingly, Spearman's rank correlation coefficient between these two rankings is 0.994.
On thing that's nice, though - Canada now beats out Australia for the number one spot!
© 2011, David E. Giles
Hi Dave
ReplyDeleteI got similar results here and here
using average correlation with other elements as a basis for weighting.
One point worth noting is that there seems to be a lot less variation in the index itself than in the rankings of countries.
Winton: Thanks for the comment - yes, you're right.
ReplyDeleteDG
Sir,
ReplyDeleteThank you for this post. In your post, you have created construct a weighted-average Better Life Index PC 1 as weights. Your PCA based BLI mostly matches with OECD-BLI. However, I was wondering can we improve the PCA-BLI if we use sum of first two PC (i.e., PC 1 and PC 2)?
Second, you have manually computed the PCA-BLI. Cannot we directly compute the PCA-BLI in EViews?
Thank you.
SK
Yes; and yes.
Delete