Download the associated Workbook for this lesson
Extracting data from tables and databases is something that is done quite frequently from within Excel. We haven't seen too many spreadsheets where extracting data is not needed. The information below is meant as a generic guide to extracting data from a database or table. The one thing you should always look for when faced with the problem of how to extract information is "what won't change!" In other words you must establish a constant from all the variables. Lets look at some ways this can be done!
In the Lookup & Reference example below we will work through it with you to again to try and demonstrate the possible thought process we would use to arrive at a solution. This will end up being a relatively deeply nested formula and will include a number of different functions. I'm using this as an example because it will end up being a very generic solution.
Quite possibly the most common of these type of Functions is the VLOOKUP. This because it will look within the first column of a nominated table array for a nominated value and return the value from the same row in a nominated column. We looked at the VLOOKUP in the lesson on "Specific Worksheet Formulas" so we will move on to a more difficult situation.
When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number. Once we have this we can extract the data from the table or database. Remember that Excel uses a grid method to locate data, a very simple but effective method. If it works for Excel it will work for us!
Let's say we have a huge table of data that consists of 40 columns and 5000 rows. We need to find 6 single cells within a table of 200,000 cells (40*5000). We have of course set up our spreadsheet by the rule book. Our table has been named AllInfo in other words AllInfo refers to the range A1:T5000. but the data we need to extract poses the following problems.
It could be in any row.
It could be in any column.
The row number can and does vary.
The column number can and does vary.
The data wanted is the names of the 3 largest and the 3 smallest tax payers and these change daily.
All we know for certain is that the column heading Tax does not change, but it's positions can! The names are always in the column immediately before the Tax column.
So in a nutshell, what we need to do is, find the 3 largest and the 3 smallest tax figures in the Tax column and then return the name immediately to the left.
The very first thing we are going to do is give the top row (headings) a named range. We'll call it TpRwAllData and it will refer to A1:T1.
The VLOOKUP requires that we know the column position and we have been told that "The column number can and does vary". It also requires we look for a known value in the first column of our table and we have no idea what that might be. So VLOOKUP can virtually be ruled out!
Let us use some positive information and work from there. The biggest help is the column heading Tax. If we can find that and get a reference to it we should then also be able to get a reference to the column immediately before it.
Click through the Lookup & Reference functions and the functions that appear to be of use are ADDRESS, HLOOKUP, INDEX, MATCH, INDIRECT, and OFFSET. If we can create a reference to the Tax column we could look within it for the 3 largest and smallest tax figures.
I like the look of MATCH because it "Returns the relative position in an array...". So lets try it and see if we could use it to find the Tax column.
=MATCH("Tax",TpRwAllData,0) gives a result of 7 (column G)
Where "Tax" is the value to find.
TpRwAllData is the first row in our table AllData
0 (zero) tells it to find an exact match.
So our result of 7 is the position of Tax in the range A1:T1.
Now we have this we can very confident that we are on the right track. Remember "When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number" We have the Column number, so now I'm going to use this nested with the ADDRESS function to create a reference to the Tax column.
=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) gives a result of $G$1
Where 1 is the Row number.
MATCH("Tax",TpRwAllData,0) is the column number.
Now we can apply the same function again but instead of 1 as the Row number we will use 5000 (the last row of our table)
=ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) gives a result of $G$5000
If we now join these using the & (Ampersand) and include a : (Colon) we can create a reference to our Tax column that will always be the address of the Tax column no matter what position it is in. In other words it will be a variable range.
=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))
This returns the result $G$1:$G$5000.
Now we can use this to find our 3 largest and the 3 smallest tax figures. There is one very important thing to remember here though and that is the result $G$1:$G$5000 will be seen by Excel as a Text string and not a range reference! To force Excel to see this as a range reference we only need to nest it within the INDIRECT function. The INDIRECT function "Returns the reference returned by a text string".
What we will show you now is a neat little trick that is very much like creating your very own Custom Function. The same can be applied to just about any nested or non-nested function and comes in very handy!
Push Ctrl + ` (circumflex) This will show all formulas on the Worksheet.
Push Ctrl + F3 This will display the Define name dialog box.
In the Names in workbook box type the name: TaxCol
In the Refers to box type: =INDIRECT()
Now type the formula: =ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) within the parenthesis of "=INDIRECT()" It should look like:
=INDIRECT(ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))) Use the formula showing on the Worksheet as your guide.
Click Add then click OK
Push Ctrl + ` (circumflex). This will now hide all formulas on the Worksheet.
What we have now is a variable named range that will always refer to our Tax column!
We now need to find the 3 largest and the 3 smallest tax figures in the named range TaxCol so let's type the word Largest in the Search for a Function box in the Insert Function dialog and then Smallest. This is an easy one as there are two functions that will do exactly what we need, they are the LARGE and SMALL functions.
=LARGE(TaxCol,1)
Where TaxCol is the range to look in.
1 is telling it we want the largest value.
What we intend to do now is to use the INDEX function to create a reference to the name of the largest tax payer.
The syntax for INDEX is =index(array,row_num,column_num)
=INDEX(AllData,MATCH(LARGE(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)
As you can see we have colour coded the formula to help you see what each nested function is being used for. The -1 is returning the column number of the column immediately before the Tax column. We can now easily find the 3 largest tax payers within our table all we need to do is substitute LARGE(TaxCol,1) with LARGE(TaxCol,2) and LARGE(TaxCol,3).
For the 3 smallest tax payers we substitute the LARGE function with the SMALL function, eg; =INDEX(AllData,MATCH(SMALL(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)
We realise that this can seem a bit daunting at first, but take your time and work through it, as the solution to this one difficult problem can be applied in many situations.