Going back to my favorite data source the UN statistics division I wanted to show how to link two large data sets in excel.
First I downloaded each set as a .CSV( Comma separated values) file. Next I move a copy of each worksheet to the same workbook and save. I do this because I will want an original copy of the raw data to look at to check what I am doing sometimes and to make the linking easier. You can link across workbooks, but that is a little more complex. Also any time you are dealing with large sets of data in excel you should save often. Excel handles large data like a Vegas juggler handles bowling pins it does it well but sometimes everything crashes and it crashes quickly. Saving often will keep you from having to try and pick up the pieces.
Next I will remove the duplicates based upon country and year. I could leave them in but I know from experience that I want a simple end result which means getting rid of the stuff. In excel under data there is a remove duplicates button. Now I want to combine the data into one table so that I could pivot it for patterns. Also I need to clean up the two data columns so the stuff I am comparing in the function match. In plain speak if I have Albania in one tab I need to have Albania in the other. Note if I have “Albania” in one tab and “Albania ” in the other it will fail because of the space on the end. Always clean the data first. Once you put in a complex function it will recalculate and eat up cpu cycles slowing you down greatly if you are still cleaning data. So in the Wages tab I will go to the last column and add the following formula. Column H Row 2
Then hit ctrl+shift+enter to make it an array function
One advantage of the Excel index and match functions is that the lookup value can be in any column in the array, unlike the VLOOKUP function, in which the lookup value must be in the first column.
Now what this did is match the data in “Wages and salaries” by country and year to the population for that country and year in “Total population, both sexes combined (thousands)” when there was a match it would copy the population and put it in the cell. Copying the formula all the way down column H and you will have the population for the country year and occupation.