1.2. Common Language Runtime

The Common Language Runtime manages the entire life cycle of an application: it locates code, compiles it, loads associated classes, manages its execution, and ensures automatic memory management. Moreover, it supports cross-language integration to permit code generated by different languages to interact seamlessly. This section peers into the inner workings of the Common Language Runtime to see how it accomplishes this. It is not an in-depth discussion, but is intended to make you comfortable with the terminology, appreciate the language-neutral architecture, and understand what's actually happening when you create and execute a program.

Compiling .NET Code

Compilers that are compliant with the CLR generate code that is targeted for the runtime, as opposed to a specific CPU. This code, known variously as Common Intermediate Language (CIL), Intermediate Language (IL), or Microsoft Intermediate Language (MSIL), is an assembler-type language that is packaged in an EXE or DLL file. Note that these are not standard executable files and require that the runtime's Just-in-Time (JIT) compiler convert the IL in them to a machine-specific code when an application actually runs. Because the Common Language Runtime is responsible for managing this IL, the code is known as managed code.

This intermediate code is one of the keys to meeting the .NET Framework's formal objective of language compatibility. As Figure 1-3 illustrates, the Common Language Runtime neither knows—nor needs to know—which language an application is created in. Its interaction is with the language-independent IL. Because applications communicate through their IL, output from one compiler can be integrated with code produced by a different compiler.

Figure 1-3. Common Language Runtime functions

Another .NET goal, platform portability, is addressed by localizing the creation of machine code in the JIT compiler. This means that IL produced on one platform can be run on any other platform that has its own framework and a JIT compiler that emits its own machine code.

In addition to producing IL, compilers that target the CLR must emit metadata into every code module. The metadata is a set of tables that allows each code module to be self-descriptive. The tables contain information about the assembly containing the code, as well as a full description of the code itself. This information includes what types are available, the name of each type, type members, the scope or visibility of the type, and any other type features. Metadata has many uses:

The most important use is by the JIT compiler, which gathers all the type information it needs for compiling directly from the metacode. It also uses this information for code verification to ensure the program performs correct operations. For example, the JIT ensures that a method is called correctly by comparing the calling parameters with those defined in the method's metadata.
Metadata is used in the Garbage Collection process (memory management). The garbage collector (GC) uses metadata to know when fields within an object refer to other objects so that the GC can determine what objects can and can't have their memory reclaimed.
.NET provides a set of classes that provide the functionality to read metadata from within a program. This functionality is known collectively as reflection. It is a powerful feature that permits a program to query the code at runtime and make decisions based on its discovery. As we will see later in the book, it is the key to working with custom attributes, which are a C#-supported construct for adding custom metadata to a program.

IL and metadata are crucial to providing language interoperability, but its real-world success hinges on all .NET compilers supporting a common set of data types and language specifications. For example, two languages cannot be compatible at the IL level if one language supports a 32-bit signed integer and the other does not. They may differ syntactically (for example, C# int versus a Visual Basic Integer), but there must be agreement of what base types each will support.

As discussed earlier, the CLI defines a formal specification, called the Common Type System (CTS), which is an integral part of the Common Language Runtime. It describes how types are defined and how they must behave in order to be supported by the Common Language Runtime.

Common Type System

The CTS provides a base set of data types for each language that runs on the .NET platform. In addition, it specifies how to declare and create custom types, and how to manage the lifetime of instances of these types. Figure 1-4 shows how .NET organizes the Common Type System.

Figure 1-4. Base types defined by Common Type System

Two things stand out in this figure. The most obvious is that types are categorized as reference or value types. This taxonomy is based on how the types are stored and accessed in memory: reference types are accessed in a special memory area (called a heap) via pointers, whereas value types are referenced directly in a program stack. The other thing to note is that all types, both custom and .NET defined, must inherit from the predefined System.Object type. This ensures that all types support a basic set of inherited methods and properties.

Core Note

In .NET, "type" is a generic term that refers to a class, structure, enumeration, delegate, or interface.

A compiler that is compliant with the CTS specifications is guaranteed that its types can be hosted by the Common Language Runtime. This alone does not guarantee that the language can communicate with other languages. There is a more restrictive set of specifications, appropriately called the Common Language Specification (CLS), that provides the ultimate rules for language interoperability. These specifications define the minimal features that a compiler must include in order to target the CLR.

Table 1-1 contains some of the CLS rules to give you a flavor of the types of features that must be considered when creating CLS-compliant types (a complete list is included with the .NET SDK documentation).

Table 1-1. Selected Common Language Specification Features and Rules
Feature
Rule
Visibility (Scope)
The rules apply only to those members of a type that are available outside the defining assembly.
Characters and casing
For two variables to be considered distinct, they must differ by more than just their case.
Primitive types
The following primitive data types are CLS compliant: Byte, Int16, Int32, Int64, Single, Double, Boolean, Char, Decimal, IntPtr, and String.
Constructor invocation
A constructor must call the base class's constructor before it can access any of its instance data.
Array bounds
All dimensions of arrays must have a lower bound of zero (0).
Enumerations
The underlying type of an enumeration (enum) must be of the type Byte, Int16, Int32, or Int64.
Method signature
All return and parameter types used in a type or member signature must be CLS compliant.

These rules are both straightforward and specific. Let's look at a segment of C# code to see how they are applied:


public class Conversion

{

   public double Metric( double inches)

   { return (2.54 * inches); }

   public double metric( double miles)

   { return (miles / 0.62); }

}

Even if you are unfamiliar with C# code, you should still be able to detect where the code fails to comply with the CLS rules. The second rule in the table dictates that different names must differ by more than case. Obviously, MeTRic fails to meet this rule. This code runs fine in C#, but a program written in Visual Basic.NET—which ignores case sensitivity—would be unable to distinguish between the upper and lowercase references.

Assemblies

All of the managed code that runs in .NET must be contained in an assembly. Logically, the assembly is referenced as one EXE or DLL file. Physically, it may consist of a collection of one or more files that contain code or resources such as images or XML data.

An assembly is created when a .NET compatible compiler converts a file containing source code into a DLL or EXE file. As shown in Figure 1-5, an assembly contains a manifest, metadata, and the compiler-generated Intermediate Language (IL). Let's take a closer look at these:

Manifest. Each assembly must have one file that contains a manifest. The manifest is a set of tables containing metadata that lists the names of all files in the assembly, references to external assemblies, and information such as name and version that identify the assembly. Strongly named assemblies (discussed later) also include a unique digital signature. When an assembly is loaded, the CLR's first order of business is to open the file containing the manifest so it can identify the members of the assembly.
Metadata. In addition to the manifest tables just described, the C# compiler produces definition and reference tables. The definition tables provide a complete description of the types contained in the IL. For instance, there are tables defining types, methods, fields, parameters, and properties. The reference tables contain information on all references to types and other assemblies. The JIT compiler relies on these tables to convert the IL to native machine code.
IL. The role of Intermediate Language has already been discussed. Before the CLR can use IL, it must be packaged in an EXE or DLL assembly. The two are not identical: an EXE assembly must have an entry point that makes it executable; a DLL, on the other hand, is designed to function as a code library holding type definitions.

Figure 1-5. Single file assembly

The assembly is more than just a logical way to package executable code. It forms the very heart of the .NET model for code deployment, version control, and security:

All managed code, whether it is a stand-alone program, a control, or a DLL library containing reusable types, is packaged in an assembly. It is the most atomic unit that can be deployed on a system. When an application begins, only those assemblies required for initialization must be present. Other assemblies are loaded on demand. A judicious developer can take advantage of this to partition an application into assemblies based on their frequency of use.
In .NET jargon, an assembly forms a version boundary. The version field in the manifest applies to all types and resources in the assembly. Thus, all the files comprising the assembly are treated as a single unit with the same version. By decoupling the physical package from the logical, .NET can share a logical attribute among several physical files. This is the fundamental characteristic that separates an assembly from a system based on the traditional DLLs.
An assembly also forms a security boundary on which access permissions are based. C# uses access modifiers to control how types and type members in an assembly can be accessed. Two of these use the assembly as a boundary: public permits unrestricted access from any assembly; internal restricts access to types and members within the assembly.

As mentioned, an assembly may contain multiple files. These files are not restricted to code modules, but may be resource files such as graphic images and text files. A common use of these files is to permit resources that enable an application to provide a screen interface tailored to the country or language of the user. There is no limit to the number of files in the assembly. Figure 1-6 illustrates the layout of a multi-file assembly.

Figure 1-6. Multi-file assembly

In the multi-file assembly diagram, notice that the assembly's manifest contains the information that identifies all files in the assembly.

Although most assemblies consist of a single file, there are several cases where multi-file assemblies are advantageous:

They allow you to combine modules created in different programming languages. A programming shop may rely on Visual Basic.NET for its Rapid Application Development (RAD) and C# for component or enterprise development. Code from both can coexist and interact in the .NET assembly.
Code modules can be partitioned to optimize how code is loaded into the CLR. Related and frequently used code should be placed in one module; infrequently used code in another. The CLR does not load the modules until they are needed. If creating a class library, go a step further and group components with common life cycle, version, and security needs into separate assemblies.
Resource files can be placed in their own module separate from IL modules. This makes it easier for multiple applications to share common resources.

Multi-file assemblies can be created by executing the C# compiler from the command line or using the Assembly Linker utility, Al.exe. An example using the C# compiler is provided in the last section of this chapter. Notably, Visual Studio.NET 2005 does not support the creation of multi-file assemblies.

Private and Shared Assemblies

Assemblies may be deployed in two ways: privately or globally. Assemblies that are located in an application's base directory or a subdirectory are called privately deployed assemblies. The installation and updating of a private assembly could not be simpler. It only requires copying the assembly into the directory, called the AppBase, where the application is located. No registry settings are needed. In addition, an application configuration file can be added to override settings in an application's manifest and permit an assembly's files to be moved within the AppBase.

A shared assembly is one installed in a global location, called the Global Assembly Cache (GAC), where it is accessible by multiple applications. The most significant feature of the GAC is that it permits multiple versions of an assembly to execute side-by-side. To support this, .NET overcomes the name conflict problem that plagues DLLs by using four attributes to identify an assembly: the file name, a culture identity, a version number, and a public key token.

Public assemblies are usually located in the assembly directory located beneath the system directory of the operating system (WINNT\ on a Microsoft Windows 2000 operating system). As shown in Figure 1-7, the assemblies are listed in a special format that displays their four attributes (.NET Framework includes a DLL file that extends Windows Explorer to enable it to display the GAC contents). Let's take a quick look at these four attributes:

Assembly Name. Also referred to as the friendly name, this is the file name of the assembly minus the extension.
Version. Every assembly has a version number that applies to all files in the assembly. It consists of four numbers in the format
<major number>.<minor number>.<build>.<revision>
Typically, the major and minor version numbers are updated for changes that break backward compatibility. A version number can be assigned to an assembly by including an AssemblyVersion attribute in the assembly's source code.
Culture Setting. The contents of an assembly may be associated with a particular culture or language. This is designated by a two-letter code such as "en" for English or "fr" for French, and can be assigned with an AssemblyCulture attribute placed in source code:
[assembly: AssemblyCulture ("fr-CA")]
Public Key Token. To ensure that a shared assembly is unique and authentic, .NET requires that the creator mark the assembly with a strong name. This process, known as signing, requires the use of a public/private key pair. When the compiler builds the assembly, it uses the private key to generate a strong name. The public key is so large that a token is created by hashing the public key and taking its last eight bytes. This token is placed in the manifest of any client assembly that references a shared assembly and is used to identify the assembly during execution.

Figure 1-7. Partial listing of Global Assembly Directory

Core Note

An assembly that is signed with a public/private key is referred to as a strongly named assembly. All shared assemblies must have a strong name.

Precompiling an Assembly

After an assembly is loaded, the IL must be compiled to the machine's native code. If you are used to working with executables already in a machine code format, this should raise questions about performance and whether it's possible to create equivalent "executables" in .NET. The answer to the second part of the statement is yes; .NET does provide a way to precompile an assembly.

The .NET Framework includes a Native Image Generator (Ngen) tool that is used to compile an assembly into a "native image" that is stored in a native image cache—a reserved area of the GAC. Any time the CLR loads an assembly, it checks the cache to see if it has an associated native image available; if it does, it loads the precompiled code. On the surface, this seems a good idea to improve performance. However, in reality, there are several drawbacks.

Ngen creates an image for a hypothetical machine architecture, so that it will run, for example, on any machine with an x86 processor. In contrast, when the JIT in .NET runs, it is aware of the specific machine it is compiling for and can accordingly make optimizations. The result is that its output often outperforms that of the precompiled assembly. Another drawback to using a native image is that changes to a system's hardware configuration or operating system—such as a service pack update—often invalidate the precompiled assembly.

Core Recommendation

As a rule, a dynamically compiled assembly provides performance equal to, or better than, that of a precompiled executable created using Ngen.

Code Verification

As part of the JIT compile process, the Common Language Runtime performs two types of verification: IL verification and metadata validation. The purpose is to ensure that the code is verifiably type-safe. In practical terms, this means that parameters in a calling and called method are checked to ensure they are the same type, or that a method returns only the type specified in its return type declaration. In short, the CLR searches through the IL and metadata to make sure that any value assigned to a variable is of a compatible type; if not, an exception occurs.

Core Note

By default, code produced by the C# compiler is verifiably type-safe. However, there is an unsafe keyword that can be used to relax memory access restrictions within a C# program (such as referencing beyond an array boundary).

A benefit of verified code is that the CLR can be certain that the code cannot affect another application by accessing memory outside of its allowable range. Consequently, the CLR is free to safely run multiple applications in a single process or address space, improving performance and reducing the use of OS resources.

< Day Day Up >