Ozgrid, Experts in Microsoft Excel Spreadsheets

Extracting Data From Excel

Excel Training Level 2 Lesson 13-Excel 97-2003

 

Download the associated  Workbook for this lesson

<< PREVIOUS LESSON |NEXT LESSON>> | BACK TO EXCEL LEVEL 2 TRAINING INDEX | Information Helpful? Why Not Donate.

FREE EXCEL HELP

Extracting Data

Extracting data from tables and databases is something that is done quite frequently from within Excel.  We haven't seen too many spreadsheets where extracting data is not needed. The information below is meant as a generic guide to extracting data from a database or table. The one thing you should always look for when faced with the problem of how to extract information is "what won't change!" In other words you must establish a constant from all the variables. Lets look at some ways this can be done!

In the Lookup & Reference example below we will work through it with you to again to try and demonstrate the possible thought process we would use to arrive at a solution. This will end up being a relatively deeply nested formula and will include a number of different functions. I'm using this as an example because it will end up being a very generic solution.

Lookup and Reference

Quite possibly the most common of these type of Functions is the VLOOKUP. This because it will look within the first column of a nominated table array for a nominated value and return the value from the same row in a nominated column. We looked at the VLOOKUP in the lesson on "Specific Worksheet Formulas" so we will move on to a more difficult situation.

When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number. Once we have this we can extract the data from the table or database. Remember that Excel uses a grid method to locate data, a very simple but effective method. If it works for Excel it will work for us!

Let's say we have a huge table of data that consists of 40 columns and 5000 rows. We need to find 6 single cells within a table of 200,000 cells (40*5000). We have of course set up our spreadsheet by the rule book. Our table has been named AllInfo in other words AllInfo refers to the range A1:T5000. but the data we need to extract poses the following problems.

  • It could be in any row.

  • It could be in any column.

  • The row number can and does vary.

  • The column number can and does vary.

The data wanted is the names of the 3 largest and the 3 smallest tax payers and these change daily.

All we know for certain is that the column heading Tax does not change, but it's positions can! The names are always in the column immediately before the Tax column.

So in a nutshell, what we need to do is, find the 3 largest and the 3 smallest tax figures in the Tax column and then return the name immediately to the left.

The very first thing we are going to do is give the top row (headings) a named range. We'll call it TpRwAllData and it will refer to A1:T1.

The VLOOKUP requires that we know the column position and we have been told that "The column number can and does vary". It also requires we look for a known value in the first column of our table and we have no idea what that might be.  So VLOOKUP can virtually be ruled out!

Let us use some positive information and work from there. The biggest help is the column heading Tax.  If we can find that and get a reference to it we should then also be able to get a reference to the column immediately before it. 

Click through the Lookup & Reference functions and the functions that appear to be of use are ADDRESS, HLOOKUP, INDEX, MATCH, INDIRECT, and OFFSET. If we can create a reference to the Tax column we could look within it for the 3 largest and smallest tax figures.

I like the look of MATCH because it "Returns the relative position in an array...". So lets try it and see if we could use it to find the Tax column.

Now we have this we can very confident that we are on the right track. Remember "When we need to extract some data from within a table or database using a Lookup & Reference function there is really only one thing we need to do and that is, find it's Row and/or Column number" We have the Column number, so now I'm going to use this nested with the ADDRESS function to create a reference to the Tax column.

Now we can apply the same function again but instead of 1 as the Row number we will use 5000 (the last row of our table)

=ADDRESS(5000,MATCH("Tax",TpRwAllData,0)) gives a result of $G$5000

If we now join these using the & (Ampersand) and include a : (Colon) we can create a reference to our Tax column that will always be the address of the Tax column no matter what position it is in. In other words it will be a variable range.

=ADDRESS(1,MATCH("Tax",TpRwAllData,0)) & ":" & ADDRESS(5000,MATCH("Tax",TpRwAllData,0))

This returns the result $G$1:$G$5000.

Now we can use this to find our 3 largest and the 3 smallest tax figures. There is one very important thing to remember here though and that is the result $G$1:$G$5000 will be seen by Excel as a Text string and not a range reference! To force Excel to see this as a range reference we only need to nest it within the INDIRECT function. The INDIRECT function "Returns the reference returned by a text string".

What we will show you now is a neat little trick that is very much like creating your very own Custom Function. The same can be applied to just about any nested or non-nested function and comes in very handy!

What we have now is a variable named range that will always refer to our Tax column!

We now need to find the 3 largest  and the 3 smallest  tax figures in the named range TaxCol  so let's type the word Largest in the Search for a Function  box in the Insert Function  dialog and then Smallest. This is an easy one as there are two functions that will do exactly what we need, they are the LARGE  and SMALL  functions.

What we intend to do now is to use the INDEX function to create a reference to the name of the largest tax payer.

The syntax for INDEX is =index(array,row_num,column_num)

=INDEX(AllData,MATCH(LARGE(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)

As you can see we have colour coded the formula to help you see what each nested function is being used for. The -1 is returning the column number of the column immediately before the Tax column. We can now easily find the 3 largest tax payers within our table all we need to do is substitute LARGE(TaxCol,1) with LARGE(TaxCol,2) and LARGE(TaxCol,3).

For the 3 smallest tax payers we substitute the LARGE function with the SMALL function, eg; =INDEX(AllData,MATCH(SMALL(TaxCol,1),TaxCol,0),MATCH("Tax",TpRwAllData,0)-1)

We realise that this can seem a bit daunting at first, but take your time and work through it, as the solution to this one difficult problem can be applied in many situations. 

<< PREVIOUS LESSON |NEXT LESSON>> | BACK TO EXCEL LEVEL 2 TRAINING INDEX