Krisna Gupta
Associate researcher di CIPS.
Economics PhD Student di Australian National University.
Double degree Master in Econ dari UI / VU Amsterdam..
Research on trade & industry in Indonesia.
Blogs & sing in spare time. L’arc~en~Ciel fan.
I assume y’all have never worked with data before
I try to allocate more for Q&A
Efficient & powerful to support your story.
Shows how good you are in understanding an issue.
Objective, most of the time.
Everyone use ‘em these days. Tough luck for data haters.
Cross-sectional data: contains a snapshot of many subjects/individuals (people, countries, firms, etc) in a given time.
Time-series data: one subject observed for a long(-ish) periods of time.
Panel data: combination of the two.
Not the best visualization but very flexible. (ex)[https://comtrade.un.org/data/]
The most mainstream tools are microsoft excel & Google sheet.
.csv
or .xlsx
or something similar.I certainly prefer working with these formats (among others)
Never lose sight of the units of your value.
Especially important if you use various data source.
Always read what’s X and Y.
How to process an information of the income of 1 million people?
When we have data of 1 million people, it’s impractical to look at 1 million values.
We look for one number that represent these 1 million values.
We also need to understand how the value is distributed.
If we group values, take frequency, then sort them, we can make a distribution plot.
We can make a smooth approximation of the distribution plot with functions.
The most famous distribution is the normal distribution
Normal distribution’s characteristics:
Median is the value lying in the middle of the whole group if we sort the value.
If we have 1 million people:
Median is often use in the presence of non-trivial number of extreme values (i.e., fat tail).
income is often not distributed normally, so median is better.
example in excel.
Find more statistics at Statista
We use currency to express many economic variables.
We can’t aggregate car + food.
But really what we want is the car and the food, not the money.
We need to take into account change in prices (i.e., inflation)
Say a firm can make 1 car and 100 food in 2020.
The firm’s GDP is $1 \times 200 + 100 \times 0.05 = 205$
in 2021, car’s price is increase to 210, hence GDP becomes 215.
Increased GDP?
It’s easy to imagine the complexity of this stuff in reality.
One thing is clear though: we want to exclude increase in GDP from price effect.
To avoid price effect, we use 2020 price so we can compare 2020 GDP with 2021 GDP.
Real GDP = When we use old prices.
Obviously to keep comparing, we still need to use 2020 prices when we calculate 2022 GDP.
also when we calculate GDP in 2023, etc.
Because we keep using 2020 prices, we say ‘constant price’.
The constant price changes from time to time.
GDP is an aggregate of the whole economy.
GDP per capita is the mean/average
Singapore vs Indonesia: rich vs powerful.
Fraction is usually expressed with percent.
We use fraction to express how important an individual is to the group/population.
India imports 3.05 billion USD of CPO from Indonesia doesn’t say a lot.
Growth is important to reflect how fast something is changing.
Percent change is nice cuz it’s unit-free.
It linearizes non-linear thing, which’s good and bad.
If your income drop by 50% today, will 50% increase tomorrow get you back to your old income?
Index is prolly the most confusing thing.
Index can be in many forms with many different weight.
For example, consumer price index (CPI) calculates a change in price level of many consumer goods.
Indeks Kedalaman Kemiskinan shows how deep the poverty of some area is.
CPI and many other indicies are shown in number near 100.
That’s because CPI is calculated as compared to 100.
For example, if CPI in 2010=100 while 2020=154, that means prices in 2020 is 54% higher than 2010.