Wednesday, October 23, 2019
The Relationship Between Life Expectancy at Birth and Gdp Per Capita
The relationship between Life Expectancy at birth and GDP per capita (PPP) Candidate: Teacher: Candidate number: Date of submission: Word Count: 2907 Section 1: Introduction In a given country, Life Expectancy at birth is the expected number of years of life from birth. Gross domestic product per capita is defined as the market value of all final goods and services produced within a country in one year, divided by the size of the population of that country. The main objective of the present project is to establish the existence of a statistical relation between Life Expectancy (y) at birth and GDP per capita (x).First, we will present in Section 2 the data, from an official governmental source, containing Life Expectancy at birth and GDP per capita of 48 countries in the year 2003. We will put this data in a table ordered alphabetically and at the end of the section we will perform some basic statistical analysis of these data. These statistics will include the mean, median, modal cl ass and standard deviation, for both Life Expectancy and GDP per capita. In Section 3 we will find the regression line which best fits our data and the corresponding correlation coefficient r.It is natural to ask if there is a non-linear model, which better describes the statistical relation between GDP per capita and Life Expectancy. This question will be studied in Section 4, where we will see if a logarithmic relation of type y=A ln(x+C) + B, is a better model. In Section 5 we will perform a chi square test to get evidence of the existence of a statistical relation between the variables x and y. In the last section of the project, other than summarizing the obtained results, we will present several possible directions to further investigation. Section 2: Data collectionThe following table shows the GDP per capita (PPP) (in US Dollars), denoted xi, and the mean Life Expectancy at birth (in years), denote yi, in 48 countries in the year 2003. The data has been collected through an online website (2). According to this website it represents official world records. Country| GDP ââ¬â per capita (xi)| Life Expectancy at birth (yi)| 1. Argentina| 11200| 75. 48| 2. Australia| 29000| 80. 13| 3. Austria| 30000| 78,17| 4. Bahamas, The| 16700| 65,71| 5. Bangladesh| 1900| 61,33| 6. Belgium| 29100| 78,29| 7. Brazil| 7600| 71,13| 8. Bulgaria| 7600| 71,08| 9. Burundi| 600| 43,02| 10. Canada| 29800| 79,83| 1. Central African Republic| 1100| 41,71| 12. Chile| 9900| 76,35| 13. China| 5000| 72,22| 14. Colombia| 6300| 71,14| 15. Congo, Republic of the| 700| 50,02| 16. Costa Rica| 9100| 76,43| 17. Croatia| 10600| 74,37| 18. Cuba| 2900| 76,08| 19. Czech Republic| 15700| 75,18| 20. Denmark| 31100| 77,01| 21. Dominican Republic| 6000| 67,96| 22. Ecuador| 3300| 71,89| 23. Egypt| 4000| 70,41| 24. El Salvador| 4800| 70,62| 25. Estonia| 12300| 70,31| 26. Finland| 27400| 77,92| 27. France| 27600| 79,28| 28. Georgia| 2500| 64,76| 29. Germany| 27600| 78,42| 30. Ghana| 2200| 56,53| 31. Greece| 20000| 78,89| 32. Guatemala| 4100| 65,23| 33.Guinea| 2100| 49,54| 34. Haiti| 1600| 51,61| 35. Hong Kong| 28800| 79,93| 36. Hungary| 13900| 72,17| 37. India| 2900| 63,62| 38. Indonesia| 3200| 68,94| 39. Iraq| 1500| 67,81| 40. Israel| 19800| 79,02| 41. Italy| 26700| 79,04| 42. Jamaica| 3900| 75,85| 43. Japan| 28200| 80,93| 44. Jordan| 4300| 77,88| 45. South Africa| 10700| 46,56| 46. Turkey| 6700| 71,08| 47. United Kingdom| 27700| 78,16| 48. United States| 37800| 77,14| Table1: GDP per capita and Life Expectancy at birth in 48 countries in 2003 (source: reference [2]) Statistical analysis: First we compute some basic statistics of the data collected in the above table.Basic statistics for the GDP per capita: Mean: x=i=148xi48 = 12900 In order to compute the median, we need to order the GDP values: 600, 700, 1100, 1500, 1600, 1900, 2100, 2200, 2500, 2900, 2900, 3200, 3300, 3900, 4000, 4100, 4300, 4800, 5000, 6000, 6300, 6700, 7600, 7600, 9100, 9900, 10600, 10700, 11200, 12300, 13900, 15700, 16700, 19800, 20000, 26700, 27400, 27600, 27600, 27700, 28200, 28800, 29000, 29100, 29800, 30000, 31100, 37800. The median is obtained as the middle value of the two central values (the 25th and the 26th): Median= 7600+91002 = 8350 In order to compute the modal class, we need to split the data in classes.If we consider classes of USD 1000 (0-999, 1000-1999, â⬠¦) we have the following table of frequencies: Class| Frequency| 0-999| 2| 1000-1999| 4| 2000-2999| 5| 3000-3999| 3| 4000-4999| 4| 5000-5999| 1| 6000-6999| 3| 7000-7999| 2| 8000-8999| 0| 9000-10000| 2| 10000-10999| 2| 11000-11999| 1| 12000-12999| 1| 13000-13999| 1| 14000-14999| 0| 15000-15999| 1| 16000-16999| 1| 17000-17999| 0| 18000-18999| 0| 19000-19999| 1| 20000-20999| 1| 21000-21999| 0| 22000-22999| 0| 23000-23999| 0| 24000-24999| 0| 25000-25999| 0| 26000-26999| 1| 27000-27999| 4| 28000-28999| 2| 29000-29999| 3| 30000-30999| 1| 31000-31999| 1| 32000-32999| 0| 3000-33999| 0| 34000-34999| 0| 35000-35999| 0| 36000-36999| 0| 37000-38000| 1| Table 2: Frequencies of GDP per capita with classes of USD 1000 With this choice of classes, the modal class is 2000-2999 (with a frequency of 5). If instead we consider classes of USD 5000 (0-4999, 5000-9999, â⬠¦) the modal class is the first: 0-4999 (with a frequency of 18). Class| Frequency| 0-4999| 18| 5000-9999| 8| 10000-14999| 5| 15000-19999| 3| 20000-24999| 1| 25000-29999| 10| 30000-34999| 2| 35000-40000| 1| Table 3: Frequencies of GDP per capita with classes of USD 5000 Standard deviation: Sx=i=148(xi-x)248 =11100Basic statistics for the Life Expectancy: Mean: y=i=148yi48 = 70,13 As before, in order to compute the median, we need to order the Life Expectancies: 41. 71, 43. 02, 46. 56, 49. 54, 50. 02, 51. 61, 56. 53, 61. 33, 63. 62, 64. 76, 65. 23, 65. 71, 67. 81, 67. 96, 68. 94, 70. 31, 70. 41, 70. 62, 71. 08, 71. 08, 71. 13, 71. 14, 71. 89, 72. 17, 72. 22, 74. 37, 75. 18, 75. 48, 75. 85, 76. 08, 76. 35, 76. 43, 77. 01, 77. 14, 77. 88, 77. 92, 78. 16, 78. 17, 78. 29, 78. 42, 78. 89, 79. 02, 79. 04, 79. 28, 79. 83, 79. 93, 80. 13, 80. 93. The median is obtained as the middle value of the two central values:Median= 72,17+72,222 = 72. 195 To find the modal class of Life Expectancy we consider modal classes of one year. The table of frequencies is the following Class| Frequency | 41| 1| 42| 0| 43| 1| 44| 0| 45| 0| 46| 1| 47| 0| 48| 0| 49| 1| 50| 1| 51| 1| 52| 0| 53| 0| 54| 0| 55| 0| 56| 1| 57| 0| 58| 0| 59| 0| 60| 0| 61| 1| 62| 0| 63| 1| 64| 1| 65| 2| 66| 0| 67| 2| 68| 1| 69| 0| 70| 3| 71| 5| 72| 2| 73| 0| 74| 1| 75| 3| 76| 3| 77| 4| 78| 5| 79| 5| 80| 2| Table 4: Frequencies of Life Expectancy at birth with classes of 1 year It appears from the table above that there are three modal classes: 71, 78 and 79 (with a frequency of 5).Standard deviation: Sy=i=148(yi-y)248 =10. 31 The standard deviations Sx and Sy have been found using the following table of data: Country| GDP| Life exp. | (x ââ¬â x? ) | (x ââ¬â x? )2| (y ââ¬â ? y)| (y ââ¬â y? )2| (x ââ¬â x ? )(y ââ¬â y ? )| Argentina| 11200| 75. 48| -1665| 2770838| 5. 35| 28. 64| -8907. 60| Australia| 29000| 80. 13| 16135| 260351671| 10. 00| 100. 03| 161374. 34| Austria| 30000| 78. 17| 17135| 293622504| 8. 04| 64. 66| 137790. 17| Bahamas. The| 16700| 65. 71| 3835| 14710421| -4. 42| 19. 53| -16947. 75| Bangladesh| 1900| 61. 33| -10965| 120222088| -8. 80| 77. 42| 96474. 63| Belgium| 29100| 78. 29| 16235| 263588754| 8. 16| 66. 1| 132501. 29| Brazil| 7600| 71. 13| -5265| 27715838| 1. 00| 1. 00| -5271. 16| Bulgaria| 7600| 71. 08| -5265| 27715838| 0. 95| 0. 90| -5007. 93| Burundi| 600| 43. 02| -12265| 150420004| -27. 11| 734. 88| 332477. 52| Canada| 29800| 79. 83| 16935| 286808338| 9. 70| 94. 11| 164294. 71| Central African Republic| 1100| 41. 71| -11765| 138405421| -28. 42| 807. 63| 334334. 75| Chile| 9900| 76. 35| -2965| 8788754| 6. 22| 38. 70| -18443. 41| China| 5000| 72. 22| -7865| 61851671| 2. 09| 4. 37| -16446. 81| Colombia| 6300| 71. 14| -6565| 43093754| 1. 01| 1. 02| -6638. 43| Congo. Republic of the| 700| 50. 02| -12165| 147977088| -20. 1| 404. 36| 244614. 57| Costa Rica| 9100| 76. 43| -3765| 14172088| 6. 30| 39. 71| -23721. 58| Croatia| 10600| 74. 37| -2265| 5128338| 4. 24| 17. 99| -9604. 66| Cuba| 2900| 76. 08| -9965| 99292921| 5. 95| 35. 42| -59301. 73| Czech Republic| 15700| 75. 18| 2835| 8039588| 5. 05| 25. 52| 14322. 40| Denmark| 31100| 77. 01| 18235| 332530421| 6. 88| 47. 35| 125482. 46| Dominican Republic| 6000| 67. 96| -6865| 47122504| -2. 17| 4. 70| 14887. 57| Ecuador| 3300| 71. 89| -9565| 91481254| 1. 76| 3. 10| -16845. 62| Egypt| 4000| 70. 41| -8865| 78580838| 0. 28| 0. 08| -2493. 16| El Salvador| 4800| 70. 62| -8065| 65037504| 0. 9| 0. 24| -3961. 73| Estonia| 12300| 70. 31| -565| 318754| 0. 18| 0. 03| -102. 33| Finland| 27400| 77. 92| 14535| 211278338| 7. 79| 60. 70| 113249. 07| France| 27600| 79. 28| 14735| 217132504| 9. 15| 83. 75| 134847. 48| Georgia| 2500| 64. 76| -10365| 107424588| -5. 3 7| 28. 82| 55644. 86| Germany| 27600| 78. 42| 14735| 217132504| 8. 29| 68. 74| 122175. 02| Ghana| 2200| 56. 53| -10665| 113733338| -13. 60| 184. 93| 145025. 00| Greece| 20000| 78. 89| 7135| 50914171| 8. 76| 76. 76| 62515. 17| Guatemala| 4100| 65. 23| -8765| 76817921| -4. 90| 24. 00| 42935. 50| Guinea| 2100| 49. 54| -10765| 115876254| -20. 59| 423. 0| 221629. 32| Haiti| 1600| 51. 61| -11265| 126890838| -18. 52| 342. 94| 208606. 00| Hong Kong| 28800| 79. 93| 15935| 253937504| 9. 80| 96. 06| 156187. 00| Hungary| 13900| 72. 17| 1035| 1072088| 2. 04| 4. 17| 2113. 54| India| 2900| 63. 62| -9965| 99292921| -6. 51| 42. 36| 64856. 98| Indonesia| 3200| 68. 94| -9665| 93404171| -1. 19| 1. 41| 11488. 77| Iraq| 1500| 67. 81| -11365| 129153754| -2. 32| 5. 38| 26351. 63| Israel| 19800| 79. 02| 6935| 48100004| 8. 89| 79. 05| 61664. 52| Italy| 26700| 79. 04| 13835| 191418754| 8. 91| 79. 41| 123290. 86| Jamaica| 3900| 75. 85| -8965| 80363754| 5. 72| 32. 73| -51288. 2| Japan| 28200| 80. 93| 15335| 235 175004| 10. 80| 116. 67| 165641. 67| Jordan| 4300| 77. 88| -8565| 73352088| 7. 75| 60. 08| -66386. 23| South Africa| 10700| 46. 56| -2165| 4685421| -23. 57| 555. 49| 51016. 52| Turkey| 6700| 71. 08| -6165| 38002088| 0. 95| 0. 90| -5864. 06| United Kingdom| 27700| 78. 16| 14835| 220089588| 8. 03| 64. 50| 119146. 94| United States| 37800| 77. 14| 24935| 621775004| 7. 01| 49. 16| 174828. 44| Table 5: Statistical analysis of the data collected in Table 1 From the last column we can compute the covariance parameter of the GDP and Life Expectancy: Sxy =148 i=148(xi-x)(yi-y)= 73011. 6 Section 3: Linear regression We start our investigation by studying the line best fit of the data in Table 1. This will allow us to see whether there is a relation of linear dependence between GDP and Life Expectancy. The regression line for the variables x and y is given by the following formula: y-yà ? =SxySx2(x-x ) By using the values found above we get: y= 62. 51 + 0. 5926*10-3 x The Pearson's correlati on coefficient is: r = 0. 6380 The following graph shows the data on Table 1 together with the line of best fit computed Figure 1: Linear regression. The value of the correlation coefficient r ~ 0. , is evidence of a moderate positive linear correlation between the variables x and y. On the other hand it is apparent from the graph above that the relation between the variables is not exactly linear. In the next section we will try to speculate on the reason for this non-linear relation and on what type of statistical relation can exist between GDP per capita and Life Expectancy. Section 4: Logarithmic regression As explained in reference [3], ââ¬Å"the main reason for this non-linear relationship [between GDP per capita and Life Expectancy] is because people consume both needs and wants.People consume needs in order to survive. Once a personââ¬â¢s needs are satisfied, they could then spend the rest of their money on non-necessities. If everyoneââ¬â¢s needs are satisfied, then any increase in GDP per capita would barely affect Life Expectancy. ââ¬Å" There are various other reasons that one can think of, to explain the non-linear relationship between GDP per capita and Life Expectancy. For example the GDP per capita is the average wealth, while one should consider also how the global wealth is distributed among the population of a given country.With this in mind, to have a more complete picture of the statistical relation between economy of a country and Life Expectancy, one should take into considerations also other economic parameters, such as the Inequality Index, that describe the distribution of wealth among the population. Moreover, the wealth of the population is not the only factor effecting Life Expectancy: one should also take into account, for example, the governmental policies of a nation towards health and poverty. For example Cuba, a country with a very low GDP per capita ($ 2900), has a relatively high Life Expectancy (76. 8 years), mostly due to the fact that the government provides basic needs and health assistance to the population. Some of these aspects will be discussed in the next section. Letââ¬â¢s try to guess what could be a reasonable relation between the variables x (GDP per capita) and y (Life Expectancy). According to the above observations we can consider the total GDP formed by two values: x= xn + xw, where xn denotes the part of wealth spent on necessities, and xw denotes the part spent on wants.It is reasonable to make the following assumptions: 1. The Life Expectancy depends linearly on the part of wealth spent on necessities: y=axn + b, (1) 2. The fraction xn/x of wealth spent on necessities, is close to 1 when x is close to 0 (if one has a little amount of money he/she will spend most of it on necessities), and is close to 0 when x is very large (if one has a very large money he/she will spend only a little fraction of on necessities). 3.We make the following choice for the function xn= f(x) sa tisfying the above requirements: xn= log (cx + 1)/c, (2) where c is some positive parameter. This function is chosen mainly for two reasons. On one hand it satisfies the requirements that are describe in 2, indeed the corresponding graph of xn/x = f(x) = log (cx + 1)/cx: Figure 2: Graph of the function y= log (cx + 1)/cx, for C=0. 5 (blue), 1 (black) and 10 (red). The blue, black and red lines correspond respectively to the choice of parameter c= 0. 5, 1 and 10.As it appears from the graph in all cases we have f(0)= 1 and f(x) is small for large values of x. On the other hand the function chosen allows us to use the statistical tools at our disposal in the excel software to derive some interesting conclusion about the statistical relation between x and y. This is what we are going to do next. First we want to find the relation between x and y under the above assumptions. Putting together equations (1) and (2) we get: y= aclncx+1+b, (3) which shows that there is a logarithmic depende nce between x and y.Equation (3) can be rewritten in the following equivalent form: if we denote A=a/c, B= b+(a/c)ln(c), C=1/c, y=Aln(x+C)+B . (4) We can now study the curve of type (4) which best fits the data in Table 1, using the statistical tools of excel spreadsheet. Unfortunately excel allows us to plot only a curve of type y= Aln(x) + B (i. e. equation of type four where C is equal to 0). For this choice of C, we get the following logarithmic curve of best fit together with the corresponding value of correlation coefficient r2. Figure 3: Logarithmic regression.To find the analogous curve of best fit for a given value of C (positive, arbitrarily chosen) we can simply add C to all the x values and redo the same plot as for C= 0 with the new independent variable x1= x + C. We omit showing the graphs containing the curve of best fit for all the possible values of C and we simply report, in the following table, the correlation coefficient r for some appropriately chosen values of C. C| r| 0. 00| 0. 77029| 0. 01| 0. 77029| 0. 1| 0. 77028| 1| 0. 77025| 10| 0. 76991| 100| 0. 76666| Table 8: correlation coefficient r2 for the curve of best fit y= Aln(x+C) +B, for some values of C. The above data indicate that the optimal choice of C is between 0. 00 and 0. 01, since in this case r is the closest to 1. Comparing the results got with the linear regression (r ~ 0,6) and the logarithmic regression (r ~ 0,8) we can conclude that the latter appears to be a better model to describe the relation between GDP per capita and Life Expectancy, since the value of the correlation coefficient is significantly bigger. From Figure 3 one the data is very far from the curve of best fit and so we may decide to discuss it separately and do the regression without it.This data is corresponds to South Africa with a GDP per capita of 10700 and a Life Expectancy at birth of 46. 56 (much lower than any other country with a comparable GDP). It is reasonable to think that this anomaly is due to the peculiar history of South Africa which, after the end of apartheid, had to face an uncontrolled violence. It is therefore difficult to fit this country in a statistical model and we can decide to remove it from our data. Doing so, we get the following new plot. Figure 4: Logarithmic regression for the data in Table 1 excluding South Africa. The new value of correlation coefficient r~ 0. 3 indicates that, excluding the anomalous data of South Africa, there is a strong positive logarithmic correlation between GDP per capita and Life Expectancy at birth. Section 5: Chi square test (? 2? test) We conclude our investigation by making a chi square test. This will allow us to confirm the existence of a relation between the variables x and y. For this purpose we formulate the following null and alternative hypotheses. H0: GDP and Life Expectancy are not correlated. H1: GDP and Life Expectancy are correlated * Observed frequency: The observed frequencies are obtained directly from Ta ble 2: | Below y? | Above y? | Total|Below x| 14| 1| 15| Above x| 16| 17| 33| Total| 30| 18| 48| Table 6: Observed frequencies for the chi square test * Expected frequency: The expected frequencies are obtained by the formula: fe = (column total (row total) / total sum | Below y? | Above y? | Total| Below x| 9. 375| 5. 625| 15| Above x| 20. 625| 12. 375| 33| Total| 30| 18| 48| Table 7: Expected frequencies for the chi square test. We can now calculate the chi square variable: ?2? = ( f0-fe)2/fe = 8. 85 In order to decide whether we accept or not the alternative hypothesis H1, we need to find the number of degrees of freedom (df) and to fix a level of confidence .The number of degrees of freedom is: df= (number of rows ââ¬â 1) (number of columns ââ¬â1) = 1 The corresponding critical values of chi square, depending on the choice of level of confidence , are given in the following table (see reference [4]) df| 00. 10| 00. 05| 0. 025| 00. 01| 0. 005| 1| 2. 706| 3. 841| 5. 024| 6 . 635| 7. 879| Table 7: Critical values of chi square with one degree of freedom. Since the value of chi square is greater than any of the above critical values, we conclude that even with a level of confidence = 0. 005 we can accept the alternative hypothesis H1: GDP and Life Expectancy are related.The above test shows that there is some relation between the two variables x (GDP per capita) and y (Life Expectancy at birth). Our goal is to further investigate this relation. Section 6: Conclusions Interpretation of results Our study of the statistical relation between GDP per capita and Life Expectancy brings us to the following conclusions. As the chi square test shows there is definitely some statistical relation between the two variables (with a confidence level = 0. 005). The study of linear regression shows that there is a moderate positive linear correlation between the two variables, with a correlation coefficient r~ 0. . This linear model can be greatly improved replacing the linear dependence with a different type of relation. In particular we considered a logarithmic relation between the variable x (GDP) and y (Life Expectancy). With this new relation we get a correlation coefficient r~ 0. 7. In fact, if we remove the data related to the anomalous country of South Africa (which should be discussed separately and does not fit well in our statistical analysis), we get an even higher correlation coefficient r~ 0. . This is evidence of a strong positive logarithmic dependence between x and y. Validity and Areas of improvement Of course one possible improvement of this project would be to consider a much more extended collection data on which to do the statistical analysis. For example one could consider a large list countries, data related to different years (other than 2003), and one could even think of studying data referring to local regions within a single country.All this can be found in literature but we decided to restrict to the data presented in this project because we considered it enough as an application of the mathematical and statistical tools used in the project. A second, probably more interesting, possible improvement of the project would be to consider other economic factors that can affect the Life Expectancy at birth of a country. Indeed the GDP per capita is just a measure of the average wealth of a country and it does not take in account the distribution of the wealth.There are however several economic indices that measure the dispersion of wealth in the population and could be considered, together with the GDP per capita, as a factor influencing Life Expectancy. For example, it would be interesting to study a linear regression model in which the dependent variable y is the Life Expectancy and with two (or more) independent variables xi, one of which should be the GDP per capita and another could be for example the Gini Inequality Index reference (measuring the dispersion of wealth in a country).This would have been very interesting but, perhaps, it would have been out of context in a project studying GDP per capita and Life Expectancy. Probably the most important direction of improvement of the present project is related to the somewhat arbitrary choice of the logarithmic model used to describe the relation between GDP and Life Expectancy. Our choice of the function y= Aln(x+C) +B, was mainly dictated by the statistic package at our disposal in the excel software used in this project.Nevertheless we could have considered different, and probably more appropriate, choices of functional relations between the variables x and y. For example we could have considered a mixed linear and hyperbolic regression model of type y= A + Bx + C/(x+D), as it is sometimes considered in literature (see reference [4]). Bibliography: 1. Gapminder World. Web. 4 Jan. 2012. ;lt;http://www. gapminder. org;gt;. 2. ââ¬Å"GDP ââ¬â per Capita (PPP) vs. Infant Mortality Rate. Index Mundi ââ¬â Country Facts. W eb. 4Jan. 2012. <http://www. indexmundi. com/g/correlation. aspx? v1=67>. 3. ââ¬Å"Life Expectancy at Birth versus GDP per Capita (PPP). â⬠Statistical Consultants Ltd. Web. 4 Jan. 2012. <http://www. statisticalconsultants. co. nz/ weeklyfeatures/WF6. html>. 4. ââ¬Å"Table: Chi-Square Probabilities. â⬠Faculty & Staff Webpages. Web. 4 Jan. 2012. <http://people. richland. edu/james/lecture/m170/tbl-chi. html>.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.