Previous Section  < Day Day Up >  Next Section

5.2. The String Class

The System.String class was introduced in Chapter 2. This section expands that discussion to include a more detailed look at creating, comparing, and formatting strings. Before proceeding to these operations, let's first review the salient points from Chapter 2:

  • The System.String class is a reference type having value semantics. This means that unlike most reference types, string comparisons are based on the value of the strings and not their location.

  • A string is a sequence of Char types. Any reference to a character within a string is treated as a char.

  • Strings are immutable. This means that after a string is created, it cannot be changed at its current memory location: You cannot shorten it, append to it, or change a character within it. The string value can be changed, of course, but the modified string is stored in a new memory location. The original string remains until the Garbage Collector removes it.

  • The System.Text.StringBuilder class provides a set of methods to construct and manipulate strings within a buffer. When the operations are completed, the contents are converted to a string. StringBuilder should be used when an application makes extensive use of concatenation and string modifications.

Creating Strings

A string is created by declaring a variable as a string type and assigning a value to it. The value may be a literal string or dynamically created using concatenation. This is often a perfunctory process and not an area that most programmers consider when trying to improve code efficiency. In .NET, however, an understanding of how literal strings are handled can help a developer improve program performance.

String Interning

One of the points of emphasis in Chapter 1, "Introduction to .NET and C#," was to distinguish how value and reference types are stored in memory. Recall that value types are stored on a stack, whereas reference types are placed on a managed heap. It turns out that that the CLR also sets aside a third area in memory called the intern pool, where it stores all the string literals during compilation. The purpose of this pool is to eliminate duplicate string values from being stored.

Consider the following code:

string poem1 = "Kubla Khan";

string poem2 = "Kubla Khan";

string poem3 = String.Copy(poem2); // Create new string object

string poem4 = "Christabel";

Figure 5-2 shows a simplified view of how the strings and their values are stored in memory.

Figure 5-2. String interning

The intern pool is implemented as a hash table. The hash table key is the actual string and its pointer references the associated string object on the managed heap. When the JITcompiler compiles the preceding code, it places the first instance of "Kubla Khan" (poem1) in the pool and creates a reference to the string object on the managed heap. When it encounters the second string reference to "Kubla Khan" (poem2), the CLR sees that the string already exists in memory and, instead of creating a new string, simply assigns poem2 to the same object as poem1. This process is known as string interning. Continuing with the example, the String.Copy method creates a new string poem3 and creates an object for it in the managed heap. Finally, the string literal associated with poem4 is added to the pool.

To examine the practical effects of string interning, let's extend the previous example. We add code that uses the equivalence (==) operator to compare string values and the Object.ReferenceEquals method to compare their addresses.

Console.WriteLine(poem1 == poem2);                // true

Console.WriteLine(poem1 == poem3);                // true

Console.WriteLine(ReferenceEquals(poem1, poem3)); // false


                  "Kubla Khan"));                 // true

The first two statements compare the value of the variables and梐s expected梤eturn a true value. The third statement compares the memory location of the variables poem3 and poem2. Because they reference different objects in the heap, a value of false is returned.

The .NET designers decided to exclude dynamically created values from the intern pool because checking the intern pool each time a string was created would hamper performance. However, they did include the String.Intern method as a way to selectively add dynamically created strings to the literal pool.

string khan = " Khan";

string poem5 = "Kubla" + khan;

Console.WriteLine(ReferenceEquals(poem5, poem1)); // false

// Place the contents of poem5 in the intern pool梚f not there

poem5 = String.Intern(poem5);

Console.WriteLine(ReferenceEquals(poem5, poem1)); // true

The String.Intern method searches for the value of poem5 ("Kubla Khan") in the intern pool; because it is already in the pool, there is no need to add it. The method returns a reference to the already existing object (Object1) and assigns it to poem5. Because poem5 and poem1 now point to the same object, the comparison in the final statement is true. Note that the original object created for poem5 is released and swept up during the next Garbage Collection.

Core Recommendation

Use the String.Intern method to allow a string variable to take advantage of comparison by reference, but only if it is involved in numerous comparisons.

Overview of String Operations

The System.String class provides a large number of static and instance methods, most of which have several overload forms. For discussion purposes, they can be grouped into four major categories based on their primary function:

  • String Comparisons. The String.Equals, String.Compare, and String.CompareOrdinal methods offer different ways to compare string values. The choice depends on whether an ordinal or lexical comparison is needed, and whether case or culture should influence the operation.

  • Indexing and Searching. A string is an array of Unicode characters that may be searched by iterating through it as an array or by using special index methods to locate string values.

  • String Transformations. This is a catchall category that includes methods for inserting, padding, removing, replacing, trimming, and splitting character strings.

  • Formatting. NET provides format specifiers that are used in conjunction with String.Format to represent numeric and DateTime values in a number of standard and custom formats.

Many of the string methods梡articularly for formatting and comparisons梐re culture dependent. Where applicable, we look at how culture affects the behavior of a method.

    Previous Section  < Day Day Up >  Next Section