Previous Section  < Day Day Up >  Next Section

2.6. Strings

The System.String, or string class, is a reference type that is represented internally by a sequence of 16-bit Unicode characters. Unlike other reference types, C# treats a string as a primitive type: It can be declared as a constant, and it can be assigned a literal string value.

String Literals

Literal values assigned to string variables take two forms: literals enclosed in quotation marks, and verbatim strings that begin with @" and end with a closing double quote ("). The difference between the two is how they handle escape characters. Regular literals respond to the meaning of escape characters, whereas verbatim strings treat them as regular text. Table 2-9 provides a summary of the escape characters that can be placed in strings.

Table 2-9. String Escape Characters

Escape Character

Description

\'

Inserts a single quote into a string

\"

Inserts a double quote

\\

Inserts a backslash; useful for file paths

\a

System alert

\b

Backspace

\f

Form feed

\n

Inserts a new line

\r

Carriage return

\t

Horizontal tab

\u

Unicode character

\v

Vertical tab

\0

Null character


A verbatim string serves the purpose its name implies: to include any character placed between the beginning and ending double quote. The following segment provides several examples of using literals:


string myQuote, path;

myQuote = @"The solution is in the problem.";

myQuote = "The solution\nis in the problem.";  

myQuote = "The Unicode representation of f is \u0066";

// The next two statements assign the same value to myQuote.

myQuote = @"""The solution is in the problem. """;

myQuote = "\"The solution is in the problem. "";

// The next two statements assign the same value to path.

path    = @"c:\my documents\notes.txt";

path    = "c:\\my documents\\notes.txt";

path    = "c:\my documents\notes.txt";      // Fails


The regular literal string is normally your best choice because it supports the escape sequences. The verbatim is to be favored when the text contains backslashes. Its most common use is with file path values and Regular Expression matching patterns (discussed in Chapter 5, "C# Text Manipulation and File I/O").

String Manipulation

The System.String class contains a variety of string manipulation members. These include ways to determine a string's length, extract a substring, compare strings, and convert a string to upper- or lowercase. The following examples illustrate some of the more common operations.

Indexing Individual Characters in a String

The foreach and while loops offer the easiest way to iterate through the characters in a string. In both cases, the operations are read-only.


// Example 1 - using foreach statement

string myQuote = "The solution is in the problem."; 

foreach (char cc in myQuote)

{

   Console.Write(cc.ToString());

}



// Example 2 - using while loop

int ndx = 0;

while (ndx < myQuote.Length)

{

   Console.Write(myQuote[ndx].ToString());

   ndx += 1;

}


Note that before an individual character can be displayed or assigned to a string, it must be converted to a string type.

String Concatenation

The + operator is used for concatenating two strings: s1 + s2 . Only one of these has to be a string type; the other can be any type, and its ToString method is called automatically to convert it.


string s1 = "My age = ";

int myAge = 28;

string cat = s1 + myAge;       // My age = 28

MyClass clStr = new MyClass;    

Cat = "Class Name = " + clStr; // Class Name = MyClass


The concatenation operation is simple to use, but it is important to understand what is going on behind the scenes: During concatenation, the strings being joined are copied and a new combined string is allocated space. Each concatenation results in the allocation of more memory equal to the length of the new string. This is an acceptable use of resources as long as the number of concatenations is minimal. However, if concatenation occurs inside a long loop, an application's performance can suffer.

Consider an example where an HTML document is constructed by inserting the <br> tag between names in a list.


// assume names is an array containing 1000 names

string nameList = "";

foreach (string custName in names)

{

   // This is inefficient and should be avoided.

   nameList = nameList + custName+"<br>";

}


Each loop results in the creation of a new string consisting of the previous string plus the new appended name and tag. A better approach is to use the StringBuilder class as a replacement for the concatenation operator. This class sets aside memory to operate on strings and thus avoids the copying and memory allocation drawbacks of the concatenation (+) operator. It includes methods to append, insert, delete, remove, and replace characters. StringBuilder is discussed in Chapter 5.

Extracting and Locating Substrings

The Substring method extracts selected portions of a string. Its two overloads are illustrated here:


string poem = "In Xanadu did Kubla Khan";

string poemSeg;

poemSeg = poem.Substring(10);     // did Kubla Khan

// second argument specifies length

poemSeg = poem.Substring(0,9);    // In Xanadu


The IndexOf method locates the next occurrence of a character pattern within a string. It searches for the occurrence from the beginning of the string or a specified location. Listing 2-1 illustrates this.

IndexOf() performs a case-sensitive search. To ensure both upper- and lowercase instances are counted, you could convert the original string to lowercase (ToLower()) before searching it. Note that there is also a LastIndexOf method that locates the last instance of a character pattern within a string.

Listing 2-1. Locating Text Occurrences in a String

// Method to count the occurrences of text in a given string

public static int CharCount(String strSource,String strToFind)

{

   int iCount=0;  // string type has index of 0

   int iPos=strSource.IndexOf(strToFind);

   while(iPos!=-1)

   {

      iCount++;

      iPos=strSource.IndexOf(strToFind, iPos+1);

   }

   return iCount;

}

public class MyApp

{

   static void Main()     

   {

      string txt = "In Xanadu did Kubla Khan";

      int ct = CharCount(txt, "a"); // ct = 4

   }

}


Comparing Strings

This topic is more complex than one would expect. The first hint of this is when you look at the System.String members and discover that there are four comparison methods: Compare, CompareOrdinal, CompareTo, and Equals. The choice of a comparison method is based on factors such as whether the comparison should be case sensitive and whether it should take culture into account.

The .NET environment is designed to handle international character sets, currencies, and dates. To support this, the handling and representation of strings can be tailored to different countries and cultures. Consider, for example, how to compare the same date in U.S. and European format. The dates "12/19/04" and "19/12/04" are logically equal, but do not have the same code value. Only a comparison method that takes culture into consideration would consider them equal. Chapter 5 explains how the various comparison methods work and the factors to be considered in selecting one.

For the majority of applications, nothing more than the standard equality (==) operator is required. This code segment illustrates its use:


bool isMatch;

string title = "Ancient Mariner";

isMatch = (title           == "ANCIENT MARINER");    // false

isMatch = (title.ToUpper() == "ANCIENT MARINER");    // true

isMatch = (title           == "Ancient"+" Mariner"); // true

isMatch =  title.Equals("Ancient Mariner");          // true


Note that the == operator is just a syntactical shortcut for calling the Equals method; it is actually faster to call Equals()directly.

    Previous Section  < Day Day Up >  Next Section