Team LiB
Previous Section Next Section

Memory and Class Management in the Common Language Runtime

This section of the chapter deals with how the Common Language Runtime manages memory, and how you can make some adjustments in your code to be more accommodating of the Common Language Runtime's memory manager. You will be introduced to the concepts of boxing and unboxing and how those concepts apply when dealing with collections and arrays. In addition, this section discusses string management, how the Common Language Runtime deals with strings, and what you can do to increase string performance.

Boxing and Unboxing

Boxing and unboxing refer to the ability to convert between value types and reference types. A value type is a simple type such as an integer or a decimal or a float. A value type can also be a struct, which is a simple value version of a class. A reference type is a type whose value is not contained in the variable; rather, the variable contains a reference that points into a location on the managed heap for the actual data. Such types are class instances and strings.

Boxing

Boxing is the process by which a value type is treated as an object. Many people think that when a value type is boxed, a dynamic reference is created. For example, assume that you box an integer variable that contains the value 10,000. Then you change the original integer value to 452. Interestingly, the boxed object will not recognize the change. When you box a value type, a copy of it is placed on the managed heap (as opposed to the stack, where normal value types reside) and a reference to that value type is placed in the object variable. After the boxing operation, there is no relation between the original value and the boxed value. Listing 14.2 is a demonstration of boxing a value, and how the boxed value and original value are not linked in any way.

Listing 14.2. A Boxing Demonstration
using System;

namespace Boxing
{
  /// <summary>
  /// Summary description for Class1.
  /// </summary>
  class Class1
  {
    /// <summary>
    /// The main entry point for the application.
    /// </summary>
    [STAThread]
    static void Main(string[] args)
    {
      int x = 10000;
      object ob = x;

      Console.WriteLine("Value of X = {0}", x );
      Console.WriteLine("Value of Ob = {0}", ob );

      x = 452;
      Console.WriteLine("Value of X after change = {0}", x );
      Console.WriteLine("Value of Ob after change = {0}", ob );
      Console.ReadLine();
    }
  }
}

Here is the output of the code in Listing 14.2:

Value of X = 10000


Value of Ob = 10000


Value of X after change = 452


Value of Ob after change = 10000

Remember that excessive numbers of allocations are among the things that slow down the GC. Every time a value type is boxed, it is a new allocation. The difficult part is that you never see the allocation. Boxing itself also incurs a performance penalty. You should be aware of when your code is boxing, and avoid it if you can.

Unboxing

Unboxing is the opposite of boxing. When a reference type is unboxed, it is converted from a reference type to a value type; its value is copied from the managed heap onto the stack.

When a value is unboxed, the object instance is checked to make sure that it is indeed a boxed value of the right type. If this check succeeds, the value is copied from the heap onto the stack and assigned to the appropriate value type variable. As with boxing, unboxing a variable incurs some performance overhead. Whereas boxing creates a new allocation on the managed heap to store the new reference value, unboxing creates a new allocation on the stack to store the unboxed value. The following few lines of code, taken from Listing 14.1, show an unboxing operation:

int x = 10000;
object ob = x;

Console.WriteLine("Value of X = {0}", x );
Console.WriteLine("Value of Ob = {0}", ob );

x = 452;
Console.WriteLine("Value of X after change = {0}", x );
Console.WriteLine("Value of Ob after change = {0}", ob );
Console.WriteLine("Value of Ob unboxed to int = {0}", (int)ob);

The output of the preceding code is as follows:

Value of X = 10000


Value of Ob = 10000


Value of X after change = 452


Value of Ob after change = 10000


Value of Ob unboxed to int = 10000

Collections and Boxing

Collections (and other weakly typed classes such as DataSets) through their nature and use perform a large amount of boxing and unboxing. For example, assume that you are using an ArrayList to store integers, as in the following code:

ArrayList al = new ArrayList();

// load arraylist from some source

foreach (int x in al)
{
    // do something with integer
}

There are a few issues with the loop in the preceding code. The first issue is that each iteration through the loop causes an unboxing operation to occur. This could become very slow and very costly, depending on the size of the ArrayList. Another issue is that the use of foreach causes some generalization code to occur that might be slower than using a number-based for loop. Eventually foreach will be optimized to work just as fast as a regular for loop. Although the foreach loop is easier to read, it might not always be the fastest solution.

The bottom line is that the performance penalties for boxing and unboxing are multiplied by the size of a collection whenever you perform a boxing or unboxing operation within an iteration through a collection. The next time you find yourself writing a for loop, double-check the contents of the loop to see whether you might be doing something expensive during each iteration.

Using the StringBuilder Class

One thing that seems to take people a while to grasp fully is that the .NET Framework treats strings as immutable. In other unmanaged languages, you typically allocate a contiguous block of memory in which to store a string. You can continue along in your code, making changes to the string at will as long as you don't exceed its allocated space.

Consider the following few lines of code:

string sample = "This is a sample string";
sample = "This is another sample string";
sample = sample.Replace("sample", "cool");
Console.WriteLine(sample);

If this were an unmanaged language, the preceding code would have allocated enough memory to store the phrase "This is a sample string". Then, on the second line, it would have modified the same piece of memory and extended the allocation. The third line would have modified the same area of memory yet again.

.NET, however, treats strings as immutable. When a string has been defined, it cannot be changed or modified. This might make you think that an operation such as Replace would be impossible to perform on strings. When you modify strings in C#, you are actually creating additional strings that represent the changed values. For example, when you execute the preceding code, the following strings are allocated and stored on the heap:

This is a sample string


This is another sample string


This is another cool string

In the preceding code, each concatenation of a single variable was actually creating a new string in memory. Consider the following for loop:

string myString = "Hello, ";
for (int i=0; i < 500; i++)
{
  myString += i.ToString();
}

The preceding for loop contains a few mistakes that might not be immediately obvious. The first mistake is that the i variable is intentionally boxed during each iteration, which can cause performance problems. The second mistake is that a string is concatenated with the += operator. As you now know, you cannot modify existing strings in C#; you can only create new strings on the heap. When you iterate through a loop 500 times, concatenating strings to an existing string, you end up with 501 allocated strings on the heap, only one of which is live (that is, only one has a valid reference pointing to it). That means 500 collections must take place on unused strings during the next Garbage Collection process.

There is a way around this performance problem. Whenever you construct a string through concatenation or modify an already allocated string, you can use the StringBuilder class instead of simple concatenation. Because of the way the StringBuilder class manages its internal data, you can perform all the concatenations you like using StringBuilder and you will not have the performance problems that come with standard concatenation. The following code shows you a more efficient way to perform repeated concatenations:

StringBuilder sb = new StringBuilder();
sb.Append("Hello ");
for (int x=0; x < 500; x++)
{
    sb.AppendFormat("{0}", x);
}

The preceding code has a boxing issue, but at least you don't have 500 unused strings sitting on the heap after the loop.

    Team LiB
    Previous Section Next Section