
I have also drawn lines that show roughly the moving average (the parentheses that start with lowess in the code. The left graph whos GDP in absolute numbers, and in the natural logarithm to the right. In the two graphs below we can see the relationship between life expectancy and GDP per capita in the countries of the world. It is therefore reasonable to look at increases in percent, which we can do with logarithms. Every dollar extra has a little less impact than the previous. But for someone who is already rich, incomes would have to be increased substantially in order to really affect that persons life. In his book Factfulness, Hans Rosling describes the important difference between having sandals, and not having sandals, or between having a bike, and not having a bike. For a poor person, a small increase in income can make a big difference. For instance, consider the relationship between quality of life and income. But in many cases, it makes theoretical sence. This might seem like a strange thing to do with data.
#STATA 13 SKEW CODE#
In the code below, we do one of each for the GDP per capita variable: We use the generate command and write ln() if we want to use the natural logarithm, or log10() if we want the base 10 logarithm. It is very straigthforward to do a logarithmic transformation in Stata. The natural number is more useful because we can more easily interpret changes in it as changes in percent. Every step up on the natural logarithm means that the original number is multiplied by factor 2.72. Every step up on the base 10 logarithm means that the original number increases tenfold. Logarithmic scales can be interpreted as showing ratios, or relationships, rather than absolute differences. The same principle applies also to the natural logarithm. But to increase it from 2 to 3 we need to increase the original number from 100 to 1000. To increase the base 10 logarithm from 1 to 2 we need to increase the original number from 10 to 100. We can see in the table that the logarithms increase slower and slower, the higher up we are on the scale. Below i s a diagram of the distribution of GDP per capita in the countries of the world, according to the QoG Basic dataset. It is easier to make money if you are already rich.ĭistributions of income are therefore often skewed. The meaning might be about faith, but it is a good descriptive principle for economics. "For whosoever hath, to him shall be given, and he should have more abundance, but whosoever hath not, from him shall be taken away even that he hath" says the Bible. In this way, the larger cities will increase the distance to the smaller cities. And a city will often also become more attractive as a destination when it grows, which means that it attracts more movers. Therefore, larger cities will grow faster in absolute numbers than small cities. The larger the city, the more children are born. Every year new kids are born in the city. Such patterns can often arise when we are dealing with self-reinforcing processess. One case of that is when we have many small values, but only a few very large. In many cases we have data that has a skewed distribution. We then manipulate the variables in systematic ways, so they are better suited for answering a specific question, or better fit into a certain model. But transforming variables is something completely different. Taking the log had almost no effect on kurtosis.Īs always, try plotting the data to see what is going on in your correlation.You can of course not make up data when doing statistical analysis.

Here, taking the log improves kurtosis and skewness. Was your data right skew or left skew? Since the former is more common, I'll guess that. However, you mention skewed data with kurtosis. On the other hand, in this example set.seed(2829101) Where a Normally distributed variable has kurtosis of 3.

The latter involve computing the Shapiro-Wilk, Shapiro-Francia, and Skewness/Kurtosis tests.

The former include drawing a stem-and-leaf plot, scatterplot, box-plot, histogram, probability-probability (P-P) plot, and quantile-quantile (Q-Q) plot. Take this example of the uniform distribution: set.seed(2810101) In Stata, you can test normality by either graphical or numerical methods. That is, there isn't an assumption of normality, but non-normal data can cause odd findings see the Anscombe quartet, for example.Īs for kurtosis, taking the log can certainly make it worse. I would imagine the DCC suffers the same limitations as the regular correlation with non-normal data.
