Previous Page
Next Page

Hack 44. Sort Any Arbitrary String of Characters

The Access query grid is great for sorting your data, but you need to help it sort on characters in the middle of a field.

I love the query grid. It's very helpful for doing all sorts of sorts (pun intended). But did you ever notice that sorting on text data always occurs on the whole field, going left to right? This makes sense because this is the most common sorting requirement. Imagine, though, the problem of having to sort on, say, just the fifth character, or the last three characters, or in any other way that isn't the norm.

This is an issue especially when Access is used with data that comes from other systems. Accounting systems are notorious for this. They often lump together a bunch of disparate data into a fixed-width field. Here's another classic problem: you are given a list of people's names in one field, structured as first name/last name, but you need to sort on just the last name.

5.6.1. Sorting in the Middle

Figure 5-12 shows a table filled with sales records. The records follow a strict format comprising a two-character vendor code and a six-character date; the remaining digits are the sales amount, and the last two of those digits are the decimal part of the amount. Therefore, the first SalesData record (CT1023044595) breaks down like this:

  • Vendor code is CT.

  • The date is October 23, 2004.

  • The amount is $45.95.

Let's say you need to sort the records by date. As shown in Figure 5-12, in each record, the date starts in position 3 and takes up six places.

Have you ever worked with data such as this? You need a record layout to go with the data; otherwise, you can't tell what kind of data it is. What if you had to guess which characters make up the date? Garbage in, garbage out, as the saying goes.

The best way to tackle a problem such as this is to use the Mid function. Mid is one of the functions that let you manipulate textual data. It works by isolating a part of a larger text string. You have to tell Mid tHRee things: the string of data, the position you want to start from, and how many characters to include. The syntax looks like this:

	Mid(string, starting position, length)

Even though we are conceptually working with dates in this example, the stored information is in text format. Therefore, it's easy to manipulate the date with standard string functions.

Figure 5-13 shows a query design in which the Mid function is used. The first column is the SalesData field itself, and the second column is a calculated field using the Mid function. Within the function, SalesData is enclosed in brackets. This is the standard way to put a field name in a function. Mid's parameters are set to isolate six characters starting at position 3 (the date, in other words).

Figure 5-12. A vendor code, date, and amount, combined in one field

When the query runs, the second column has just the date in it because Mid does the job of grabbing the characters from positions 3 through 8. The second column receives the sort because, after all, the date is what we need to sort on. So, where the Sort row and the second column meet, set the choice to sort in ascending order by selecting Ascending from the drop-down menu.

Note in Figure 5-13 that the Show checkbox for the calculated field is unchecked. You don't have to actually display the column when the query is run. It is used just to make the sort happen, but it doesn't necessarily have to appear in the results.

Figure 5-13. Using the Mid function to isolate the date for sorting

Figure 5-14 shows the result of running the query. Now the sales records are sorted by date. The first returned record (MR0104047011) contains 010404, the equivalent of January 4, 2004.

Figure 5-14. Records sorted by date

5.6.2. Sorting on Mixed Fixed Positions

What if you have to sort on both the date and the amount? What if the sort has to show the date in ascending order and the amount in descending order? This is a common requirement: to see money amounts sorted from high to low. Can you do this?

But of course! In this case, the technique is to have two columns with expressions, one each for the date and the amount. Figure 5-15 shows how you do this, with the amount starting in the ninth position. The length parameter for the Mid function that processes the amount is set to 5. Usually, a length is known, but not always. In this example, the amounts among the records might be four or five digits long, so setting the length to 5 works for all records.

Figure 5-15. A query design for sorting on two subsections of the field

In this example, as before, only the actual SalesData field is shown when the query runs. Therefore, the second and third columns both have unchecked Show boxes. The second and third columns both use Mid to work on different substrings within the same full SalesData string.

Now the result is slightly different. Figure 5-16 displays the returned data. Comparing this result to the result shown in Figure 5-14, you can see that records 6 through 8 have been reordered. These records share the same date of January 16, 2004 (011604), but now the amounts are reordered based on the query specification.

Figure 5-16. Sorting on date and amount, which returns a different order

5.6.3. Sorting on Characters When Their Position Is Unknown

Often, you need to manipulate data imported from external systems before you can use it in your application. This is a common issue with names. Your database table might have separate fields for first and last names. This of course makes it a no-brainer to sort on last name. But imagine the difficulty when you are given full names in one field. What if the names are in the order of first and then last name, with a space in the middle, and you need to sort on the last name? The difference here, compared to the previous sales information example, is that you can't know, record by record, in which position the last name starts.

The trick to sorting by last name is to first determine the position of the space. In this case, you use the InStr function with the Mid function. Instead of hard-coding the position of the space, InStr returns the position of the space.

The InStr function tells you the starting position of the first occurrence of a substring inside a larger string. In this example, the string being searched is the Client field, and the substring is a space. By itself, InStr looks like this:

	InStr([Client]," ")

Here we use the InStr function to tell the Mid function the position from which it should start counting. InStr is embedded inside the Mid function. Together, they look like this:

	Mid([Client],InStr([Client]," ")+1,10)

Note that although the InStr function returns the position of the space, we are interested in the starting position of the last name. This is one position to the right of the space, and for this reason, 1 is added after the InStr function. The returned value of InStr plus the value 1 is used as the starting position parameter in the Mid function.

Figure 5-17 shows how to set up a query using these nested functions. The value of 10 is arbitrarily used here as the length of the last name. Last names vary in length, but using 10 characters to sort on all but guarantees the sort will be in the right order.

Figure 5-17. Using nested functions in a sort

Figure 5-18 shows the result of the query. Clients are sorted by last name, within a single field that contains full first and last names. Neat!

5.6.4. Hacking the Hack

Sorting on names isn't difficult when first and last names are all you have to work with. But what about middle names, titles, and suffixes? How can you handle these? Let's up the ante on this hack and include a custom function in the query.

Figure 5-18. Clients sorted by last name

The function we need will examine the names in the Client field to determine the position of the space. Here's the catch: now there could be more than one space. My name is Ken S. Bluttman; that's two spacesone on each side of the middle initial. Some names have three, four, or even five spaces. The function is meant to simply figure out the best space to use; it figures out the position of that space and tells the Mid function where it is.

First, you write the function in a VBA code module. To do this, from the database window, go to the Modules tab, and select to create a new module. Enter this code:

  Function find_space(client_name As String)
  Dim name_length As Integer
  Dim space_loop As Integer
  Dim space_count As Integer
  Dim partial_name As String
  Dim first_space_position As Integer
  'count spaces in full name
  space_count = 0
  name_length = Len(client_name)
  For space_loop = 1 To name_length
    If Mid(client_name, space_loop, 1) = " " Then
      space_count = space_count + 1
    End If
  Next space_loop
  'parse the full name using assumptions in each Case
  Select Case space_count
   Case 0
     'no spaces found!    
     'return 1 as the position
     find_space = 1
   Case 1
     'a first name and last name
     'split after first space
     find_space = InStr(client_name, " ")
   Case 2, 3
     'assume a first name, Middle name, and last name (2 spaces)
     'or a first name, Middle name, last name, and suffix (3 spaces)
     'split after second space
     find_space = InStr(client_name, " ")
     first_space_position = find_space
     partial_name = _
        Mid(client_name, find_space, name_length - find_space)
     find_space = InStr(partial_name, " ") + first_space_position - 1
   Case Else
     'difficult to make assumption on name structure
     'split after first space
     find_space = InStr(client_name, " ")
End Select
End Function

In a nutshell, the function takes a client name, counts how many spaces are in it, and then determines which space is best. The position of that space is used in the Mid function as before.

In the query grid, the call to the function, named find_space, is embedded in the Mid function, like this:


Figure 5-19 shows how to set up the query.

Figure 5-19. The Mid function, using the find_space function

When the query runs, each client name is examined in the find_space function. The function returns the best space position, and the names are sorted. Figure 5-20 shows the results of running the query.

Figure 5-20. Sorting by last name when middle names and suffixes are present

Looking closely, you will see that the sort isn't without problems. The way the function is written, it assumes that when there are two spaces, the format of the name is first name, last name, suffix. This works for a name such as Alex Avakian III. The function assumes the last name starts after the first space.

Unfortunately, a name such as Tammy Jill Adams doesn't end up with the other last names beginning with A. The function assumes the best space is the first, and the name is sorted as though the last name starts with J. Tammy's last name starts after the second space. Sorry, Tammy!

Splitting names apart is traditionally a thorny problem. Some names always confound the best intentions of a name-parsing routine. That must be why I keep getting catalogs addressed to Mr. Ken.

5.6.5. See Also

    Previous Page
    Next Page