11.4. DataSets, DataTables, and the Disconnected Model

The ADO.NET disconnected model is based on using a DataSet object as an in-memory cache. A DataAdapter serves as the intermediary between the DataSet and the data source that loads the cache with data. After it has completed its task, the DataAdapter returns the connection object to the pool, thus disconnecting the data from the data source. Interestingly, the DataAdapter is actually a wrapper around a data provider's DataReader, which performs the actual data loading.

The DataSet Class

In many ways, a DataSet object plays the role of an in-memory database. Its Tables property exposes a collection of DataTables that contain data and a data schema describing the data. The Relations property returns a collection of DataRelation objects that define how tables are interrelated. In addition, DataSet methods are available to Copy, Merge, and Clear the contents of the DataSet.

Keep in mind that the DataSet and DataTable are core parts of ADO.NET and—unlike the Connection, DataReader, and DataAdapter—they are not tied to a specific data provider. An application can create, define, and populate a DataSet with data from any source.

Besides tables and their relations, a DataSet can also contain custom information defined by the application. A look at Figure 11-6 shows the major collection classes in the DataSet hierarchy. Among these is PropertyCollection, which is a set of custom properties stored in a hash table and exposed through the DataSet.ExtendedProperties property. It is often used to hold a time stamp or descriptive information such as column validation requirements for tables in the data set.

Figure 11-6. `DataSet` class hierarchy

The discussion of the DataSet class begins with its most important member—the DataTable collection.

DataTables

One step below the DataSet in the disconnected model hierarchy is the DataTable collection. This collection—accessed through the DataSet.Tables property—stores data in a row-column format that mimics tables in a relational database. The DataTable class has a rich set of properties and methods that make it useful as a stand-alone data source or as part of a table collection in a DataSet. The most important of these are the Columns and Rows properties, which define the layout and content of a table.

DataColumns

The DataTable.Columns property exposes a collection of DataColumn objects that represent each data field in the DataTable. Taken together, the column properties produce the data schema for the table. Table 11-5 summarizes the most important properties.

Table 11-5. Properties of the DataColumn Class
Method
Description
ColumnName
Name of the column.
DataType
Type of data contained in this column.
Example: col1.DataType = System.Type.GetType("System.String")
MaxLength
Maximum length of a text column. -1 if there is no maximum length.
ReadOnly
Indicates whether the values in the column can be modified.
AllowDBNull
Boolean value that indicates whether the column may contain null values.
Unique
Boolean value that indicates whether the column may contain duplicate values.
Expression
An expression defining how the value of a column is calculated.
Example: colTax.Expression = "colSales * .085";
Caption
The caption displayed in the user interface.
DataTable
The name of the DataTable containing this column.

DataTable columns are created automatically when the table is filled with the results of a database query or from reading an XML file. However, for applications that fill a table dynamically—such as from user input or real-time data acquisition—it may be necessary to write the code that defines the table structure. It's a worthwhile exercise in its own right that enhances a developer's understanding of the DataSet hierarchy.

The following segment creates a DataTable object, creates DataColumn objects, assigns property values to the columns, and adds them to the DataTable. To make things interesting, a calculated column is included.


DataTable tb = new DataTable("Order");

DataColumn dCol = new DataColumn("ID",

      Type.GetType("System.Int16"));

dCol.Unique = true;  // ID must be unique for each data row

dCol.AllowDBNull = false;

tb.Columns.Add(dCol);

dCol= new DataColumn("Price", Type.GetType("System.Decimal"));

tb.Columns.Add(dCol);

dCol=new DataColumn("Quan",Type.GetType("System.Int16"));

tb.Columns.Add(dCol);

dCol= new DataColumn("Total",Type.GetType("System.Decimal"));

dCol.Expression= "Price * Quan";

tb.Columns.Add(dCol);

// List column names and data type

foreach (DataColumn dc in tb.Columns)

{

   Console.WriteLine(dc.ColumnName);

   Console.WriteLine(dc.DataType.ToString());

}

Note that the ID column is defined to contain unique values. This constraint qualifies the column to be used as a key field in establishing a parent-child relationship with another table in a DataSet. To qualify, the key must be unique—as in this case—or defined as a primary key for the table. You assign a primary key to a table by setting its PrimaryKey field to the value of an array containing the column(s) to be used as the key. Here is an example that specifies the ID field a primary key:


DataColumn[] col = {tb.Columns["ID"]};

tb.PrimaryKey = col;

We'll see how to use a primary key to create table relationships and merge data later in this section.

Core Note

If a primary key consists of more than one column—such as a first and last name—you can enforce a unique constraint on these columns in three steps: by creating an array to hold the columns, creating a UniqueConstraint object by passing the array to its constructor; and adding the constraint to the data table's Constraints collection:
DataColumn[] cols = {tb.Columns["fname"] tb.Columns["lname"]}; tb.Constraints.Add(new UniqueConstraint("nameKey", cols));

DataRows

Data is added to a table by creating a new DataRow object, filling it with column data, and adding the row to the table's DataRow collection. Here is an example that places data in the table created in the preceding example.


DataRow row;

row = tb.NewRow();      // Create DataRow

row["Title"] = "Casablanca";

row["Price"] = 22.95;

row["Quan"] = 2;

row["ID"] = 12001;

tb.Rows.Add(row);       // Add row to Rows collection

Console.WriteLine(tb.Rows[0]["Total"].ToString()); // 45.90

A DataTable has methods that allow it to commit and roll back changes made to the table. In order to do this, it keeps the status of each row in the DataRow.RowState property. This property is set to one of five DataRowState enumeration values: Added, Deleted, Detached, Modifed, or Unchanged. Let's extend the preceding example to demonstrate how these values are set:


tb.Rows.Add(row);                    // Added

tb.AcceptChanges();                  // ...Commit changes

Console.Write(row.RowState);         // Unchanged

tb.Rows[0].Delete();                 // Deleted

// Undo deletion

tb.RejectChanges();                  // ...Roll back

Console.Write(tb.Rows[0].RowState);  // Unchanged

DataRow myRow;

MyRow = tb.NewRow();                 // Detached

The two DataTable methods AcceptChanges and RejectChanges are equivalent to the commit and rollback operations in a database. These apply to all changes made from the time the table was loaded or since AcceptChanges was previously invoked. In this example, we are able to restore a deleted row because the deletion is not committed before RejectChanges is called. Note that the changes are to the data table—not the original data source.

For each column value in a row, ADO.NET maintains a current and original value. When RejectChanges is called, the current values are set to the original values. The opposite occurs if AcceptChanges is called. The two sets of values can be accessed concurrently through the DataRowVersion enumerations Current and Original:


DataRow r = tb.Rows[0];

r["Price"]= 14.95;

r.AcceptChanges();

r["Price"]= 16.95;

Console.WriteLine("Current: {0} Original: {1} ",

      r["Price",DataRowVersion.Current],

      r["Price",DataRowVersion.Original]);

// output:  Current: 16.95  Original: 14.95

Keeping track of table row changes takes on added importance when the purpose is to update an underlying data source. We'll see later in this section how the DataAdapter updates database tables with changes made to DataTable rows.

Loading Data into a DataSet

Now that we have seen how to construct a DataTable and punch data into it row-by-row, let's look at how data and a data schema can be automatically loaded from a relational database into tables in a DataSet. For details on loading XML data, refer to Section 11.5, "XML and ADO.NET," on page 533.

Using the DataReader to Load Data into a DataSet

A DataReader object can be used in conjunction with a DataSet or DataTable to fill a table with the rows generated by a query. This requires creating a DataReader object and passing it as a parameter to the DataTable.Load method:


cmd.CommandText = "SELECT * FROM movies WHERE movie_Year < 1945";

DBDataReader rdr =

      cmd.ExecuteReader(CommandBehavior.CloseConnection);

DataTable dt = new DataTable("movies");

dt.Load(rdr);   // Load data and schema into table

Console.WriteLine(rdr.IsClosed);   // True

The DataReader is closed automatically after all of the rows have been loaded. The CloseConnection parameter ensures that the connection is also closed.

If the table already contains data, the Load method merges the new data with the existing rows of data. Merging occurs only if rows share a primary key. If no primary key is defined, rows are appended. An overloaded version of Load takes a second parameter that defines how rows are combined. This parameter is a LoadOption enumeration type having one of three values: OverwriteRow, PreserveCurrentValues, or UpdateCurrentValues. These options specify whether the merge operation overwrites the entire row, original values only, or current values only. This code segment illustrates how data is merged with existing rows to overwrite the current column values:


cmd.CommandText = "SELECT * FROM movies WHERE movie_Year < 1945";

DBDataReader rdr = cmd.ExecuteReader( );

DataTable dt = new DataTable("movies");

dt.Load(rdr);                    // Load rows into table

Console.Write(dt.Rows[0]["movie_Title"]);  // Casablanca



// Assign primary key so rows can be merged

DataColumn[] col = new DataColumn[1];

col[0] = dt.Columns["movie_ID"];

dt.PrimaryKey = col;

DataRow r = dt.Rows[0];           // Get first row of data

r["movie_Title"] = "new title";   // Change current column value

// Since reader is closed, must fill reader again

rdr = cmd.ExecuteReader(CommandBehavior.CloseConnection);

// Merge data with current rows. Overwrites current values

dt.Load(rdr, LoadOption.UpdateCurrentValues );

// Updated value has been overwritten

Console.Write(dt.Rows[0]["movie_Title"]);  // Casablanca

Using the DataAdapter to Load Data into a DataSet

A DataAdapter object can be used to fill an existing table, or create and fill a new table, with the results from a query. The first step in this process is to create an instance of the DataAdapter for a specific data provider. As the following code shows, several constructor overloads are available:


// (1) The easiest: a query and connection string as arguments

String sql = "SELECT * FROM movies";

SqlDataAdapter da = new SqlDataAdapter(sql, connStr);



// (2) Assign a command object to the SelectCommand property

SqlDataAdapter da = new SqlDataAdapter();

SqlConnection conn = new SqlConnection(connStr);

da.SelectCommand = new SqlCommand(sql,conn);



// (3) Pass in a query string and connection object

SqlConnection conn = new SqlConnection(connStr);

SqlDataAdapter da = new SqlDataAdapter(sql, conn);

Of these, the first version is the simplest. It accepts two strings containing the query and connection. From these, it constructs a SqlCommand object that is assigned internally to its SelectCommand property. Unlike the other constructors, there is no need to write code that explicitly creates a SqlCommand or SqlConnection object.

In the overloads that accept a connection object as a parameter, the opening and closing of the connection is left to the DataAdapter. If you add a statement to explicitly open the connection, you must also include code to close it. Otherwise, the DataAdapter leaves it open, which locks the data in the database.

After the DataAdapter object is created, its Fill method is executed to load data into a new or existing table. In this example, a new table is created and assigned the default name Table:


DataSet ds = new DataSet();

// Create DataTable, load data, and add to DataSet

// Could use da.Fill(ds,"movies") to specify table name.

int numrecs = da.Fill(ds);  // Returns number of records loaded

For an existing table, the behavior of the Fill command depends on whether the table has a primary key. If it does, those rows having a key that matches the key of the incoming data are replaced. Incoming rows that do not match an existing row are appended to the DataTable.

Using the DataAdapter to Update a Database

After a DataAdapter has loaded data into a table, the underlying connection is closed, and subsequent changes made to the data are reflected only in the DataSet—not the underlying data source. To apply changes to the data source, a DataAdapter is used to restore the connection and send the changed rows to the database. The same DataAdapter used to fill the DataSet can be used to perform this task.

The DataAdapter has three properties?tt>InsertCommand, DeleteCommand, and UpdateCommand—that are assigned the actual SQL commands to perform the tasks that correspond to the property name. These commands are executed when the Upate method of the DataAdapter is invoked. The challenge lies in creating the SQL commands that post the changes and assigning them to the appropriate DataAdapter properties. Fortunately, each data provider implements a CommandBuilder class that can be used to handle this task automatically.

The CommandBuilder Object

A CommandBuilder object generates the commands necessary to update a data source with changes made to a DataSet. It's amazingly self-sufficient. You create an instance of it by passing the related DataAdapter object to its constructor; then, when the DataAdapter.Update method is called, the SQL commands are generated and executed. The following segment shows how changes to a DataTable are flushed to the database associated with the DataAdapter:


DataTable dt= ds.Tables["movies"]; // Shortcut to reference table



// (1) Use command builder to generate update commands

SqlCommandBuilder sb = new SqlCommandBuilder(da);



// (2) Add movie to table

DataRow drow = dt.NewRow();

drow["movie_Title"] = "Taxi Driver";

drow["movie_Year"] = "1976";

dt.Rows.Add(drow);



// (3) Delete row from table

dt.Rows[4].Delete();



// (4) Edit Column value

dt.Rows[5]["movie_Year"] = "1944";



// (5) Update underlying Sql Server table

int updates = da.Update(ds, "movies");

MessageBox.Show("Rows Changed: " +updates.ToString());  // 3

There are a couple of restrictions to be aware of when using the CommandBuilder: The Select command associated with the DataAdapter must refer to a single table, and the source table in the database must include a primary key or a column that contains unique values. This column (or columns) must be included in the original Select command.

Core Note

You can create your own update commands without using a CommandBuilder. Although it can be a lengthy process, it can also yield more efficient commands. For applications that require considerable database updates, you may want to consult an ADO.NET book for details on coding update logic directly.

Synchronizing the DataSet and the DataBase

As demonstrated in this example, the use of a DataAdapter simplifies and automates the process of updating a database—or any data store. However, there is a rock in this snowball: the problem of multi-user updates. The disconnected model is based on optimistic concurrency, an approach in which the rows of the underlying data source are not locked between the time they are read and the time updates are applied to the data source. During this interval, another user may update the data source. Fortunately, the Update method recognizes if changes have occurred since the previous read and fails to apply changes to a row that has been altered.

There are two basic strategies for dealing with a concurrency error when multiple updates are being applied: roll back all changes if a violation occurs, or apply the updates that do not cause an error and identify the ones that do so they can be reprocessed.

Using Transactions to Roll Back Multiple Updates

When the DataAdapter.ContinueUpdateonErrors property is set to false, an exception is thrown when a row update cannot be completed. This prevents subsequent updates from being attempted, but does not affect updates that occurred prior to the exception. Because updates may be interdependent, applications often require an all-or-none strategy. The easiest way to implement this strategy is to create a .NET transaction in which all of the update commands execute. To do so, create a SqlTransaction object and associate it with the SqlDataAdapater.SelectCommand by passing it to its constructor. If an exception occurs, the transaction's Rollback method is used to undo any changes; if no exceptions occur, the Commit method is executed to apply all the update commands. Listing 11-4 is an example that wraps the updates inside a transaction.

Listing 11-4. Using Transaction to Roll Back Database Updates


SqlDataAdapter da = new SqlDataAdapter();

SqlCommandBuilder sb = new SqlCommandBuilder(da);

SqlTransaction tran;

SqlConnection conn = new SqlConnection(connStr);

conn.Open();      // Must open to use with transaction

// (1) Create a transaction

SqlTransaction tran = conn.BeginTransaction();

// (2) Associate the SelectCommand with the transaction

da.SelectCommand = new SqlCommand(sql, conn, tran);

DataSet ds = new DataSet();

da.Fill(ds, "movies");

//

// Code in this section makes updates to DataSet rows

try

{

   int updates = da.Update(ds, "movies");

   MessageBox.Show("Updates: "+updates.ToString());

}

// (3) If exception occurs, roll back all updates in transaction

catch (Exception ex)

{

   MessageBox.Show(ex.Message);   // Error updating

   if (tran != null)

   {

      tran.Rollback();      // Roll back any updates

      tran = null;

      MessageBox.Show("All updates rolled back.");

   }

}

finally

{

// (4) If no errors, commit all updates

   if (tran != null)

   {

       tran.Commit();

       MessageBox.Show("All updates successful. ");

       tran = null;

   }

}

conn.Close();

Identifying Rows That Cause Update Errors

When DataAdapter.ContinueUpdateonErrors is set to TRue, processing does not halt if a row cannot be updated. Instead, the DataAdapter updates all rows that do not cause an error. It is then up to the programmer to identify the rows that failed and determine how to reprocess them.

Rows that fail to update are easily identified by their DataRowState property (discussed earlier in the description of DataRows). Rows whose update succeeds have a value of Unchanged; rows that fail have their original Added, Deleted, or Modified value. A simple code segment demonstrates how to loop through the rows and identify those that are not updated (see Listing 11-5).

Listing 11-5. Identify Attempts to Update a Database That Fails


// SqlDataAdapter da loads movies table

da.ContinueUpdateOnError = true;

DataSet ds = new DataSet();

try

{

   da.Fill(ds, "movies");

   DataTable dt = ds.Tables["movies"];

   SqlCommandBuilder sb = new SqlCommandBuilder(da);

   // ... Sample Update operations

   dt.Rows[29].Delete();                      // Delete

   dt.Rows[30]["movie_Year"] = "1933";        // Update

   dt.Rows[30]["movie_Title"] = "King Kong";  // Update

   dt.Rows[31]["movie_Title"] = "Fantasia";   // Update

   DataRow drow = dt.NewRow();

   drow["movie_Title"] = "M*A*S*H";

   drow["movie_Year"] = "1970";

   dt.Rows.Add(drow);                         // insert

   // Submit updates

   int updates = da.Update(ds, "movies");

   // Following is true if any update failed

   if (ds.HasChanges())

   {

      // Load rows that failed into a DataSet

      DataSet failures = ds.GetChanges();

      int rowsFailed = failures.Rows.Count;

      Console.WriteLine("Update Failures: "+rowsFailed);

      foreach (DataRow r in failures.Tables[0].Rows )

      {

         string state = r.RowState.ToString());

         // Have to reject changes to show deleted row

         if (r.RowState == DataRowState.Deleted)

               r.RejectChanges();

         string ID= ((int)r["movie_ID"]).ToString();

         string msg= state + " Movie ID: "+ID;

         Console.WriteLine(msg);

      }

   }

Note that even though the delete occurs first, it does not affect the other operations. The SQL statement that deletes or updates a row is based on a row's primary key value—not relative position. Also, be aware that updates on the same row are combined and counted as a single row update by the Update method. In this example, updates to row 30 count as one update.

Handling concurrency issues is not a simple task. After you identify the failures, the next step—how to respond to the failures—is less clear, and depends on the application. Often times, it is necessary to re-read the rows from the database and compare them with the rows that failed in order to determine how to respond. The ability to recognize RowState and the current and original values of rows is the key to developing code that resolves update conflicts.

Defining Relationships Between Tables in a DataSet

A DataRelation is a parent/child relationship between two DataTables. It is defined on matching columns in the two tables. The columns must be the same DataType, and the column in the parent table must have unique values. The syntax for its constructor is


public DataRelation(

   string relationName,

   DataColumn parentColumn,

   DataColumn childColumn)

A DataSet has a Relations property that provides access to the collection of DataRelations defined for tables contained in the DataSet. Use the Relations.Add method to place relations in the collection. Listing 11-6 illustrates these ideas. It contains code to set up a parent/child relationship between the directors and movies tables in order to list movies by each director.

Listing 11-6. Create a Relationship Between the Directors and Movies Tables


DataSet ds = new DataSet();

// (1) Fill table with movies

string sql = "SELECT movie_ID,movie_Title,movie_DirectorID,

      movie_Year FROM movies";

SqlConnection conn = new SqlConnection(connStr);

SqlCommand cmd = new SqlCommand();

SqlDataAdapter da = new SqlDataAdapter(sql, conn);

da.Fill(ds, "movies");

// (2) Fill table with directors

sql = "SELECT director_id,(first_name + ' '+ last_name) AS

      fullname FROM directors";

da.SelectCommand.CommandText = sql;

da.Fill(ds, "directors");

// (3) Define relationship between directors and movies

DataTable parent = ds.Tables["directors"];

DataTable child = ds.Tables["movies"];

DataRelation relation = new DataRelation("directormovies",

      parent.Columns["director_ID"],

      child.Columns["movie_DirectorID"]);

// (4) Add relation to DataSet

ds.Relations.Add(relation);

// (5) List each director and his or her movies

foreach (DataRow r in parent.Rows)

{

   Console.WriteLine(r["fullname"];      // Director name

   foreach (DataRow rc in

      r.GetChildRows("directormovies"))

   {

      Console.WriteLine("  "+rc["movie_title"]);

   }

}

/*

   Sample Output:

      David Lean

         Lawrence of Arabia

         Bridge on the River Kwai, The

      Victor Fleming

         Gone with the Wind

         Wizard of Oz, The

*/

Relations and Constraints

When a relationship is defined between two tables, it has the effect of adding a ForeignKeyConstraint to the Constraints collections of the child DataTable. This constraint determines how the child table is affected when rows in a parent table are changed or deleted. In practical terms, this means that if you delete a row in the parent table, you can have the related child row(s) deleted—or optionally, have their key value set to null. Similarly, if a key value is changed in the parent table, the related rows in the child can have their key value changed or set to null.

The rule in effect is determined by the value of the DeleteRule and UpdateRule properties of the constraint. These can take one of four Rule enumeration values:

Cascade. Deletes or updates related rows in child table. This is the default.
None. Takes no action.
SetDefault. Sets key values in child rows to column's default value.
SetNull. Sets key values in child rows to null.

This code segment illustrates how constraints affect the capability to add a row to a child table and delete or change a row in the parent table. The tables from the preceding example are used.


// (1) Try to add row with new key to child table

DataRow row = child.NewRow();

row["movie_directorID"] = 999;

child.Rows.Add(row);   // Fails ?999 does not exist in parent

// (2) Delete row in parent table

row = parent.Rows[0];

row.Delete();      // Deletes rows in child having this key

// (3) Relax constraints and retry adding row

ds.EnforceConstraints = false;

row["movie_directorID"] = 999;

child.Rows.Add(row);   // Succeeds

ds.EnforceConstraints = true;    // Turn back on

// (4) Change constraint to set rows to null if parent changed

((ForeignKeyConstraint)child.Constraints[0]).DeleteRule =

       Rule.SetNull ;

Note that setting the EnforceConstraints property to false turns off all constraints—which in database terms eliminates the check for referential integrity. ^[3] This allows a movie to be added even though its movie_DirectorID column (foreign key) does not have a corresponding row in the directors table. It also permits a director to be deleted even though a movie by that director exists in the movies table. This clearly compromises the integrity of the database and should be used only when testing or populating individual tables in a database.

^[3] The foreign key in any referencing table must always refer to a valid row in the referenced table.

Choosing Between the Connected and Disconnected Model

The DataReader and DataSet offer different approaches to processing data—each with its advantages and disadvantages. The DataReader provides forward-only, read-only access to data. By processing a row at a time, it minimizes memory requirements. A DataSet, on the other hand, offers read/write access to data, but requires enough memory to hold a copy of the data fetched from a data source. From this, you can derive a couple of general rules: If the application does not require the capability to update the data source and is used merely for display and selection purposes, a DataReader should be the first consideration; if the application requires updating data, a DataSet should be considered.

Of course, the general rules have to be weighed against other factors. If the data source contains a large number of records, a DataSet may require too many resources; or if the data requires only a few updates, the combination of DataReader and Command object to execute updates may make more sense. Despite the gray areas, there are many situations where one is clearly preferable to the other.

A DataSet is a good choice when the following apply:

Data need to be serialized and/or sent over the wire using HTTP.
Multiple read-only controls on a Windows Form are bound to the data source.
A Windows Form control such as a GridView or DataView is bound to an updatable data source.
A desktop application must edit, add, and delete rows of data.

A DataReader is a good choice when the following apply:

A large number of records must be handled so that the memory requirements and time to load make a DataSet impracticable.
The data is read-only and bound to a Windows or Web Form list control.
The database is highly volatile, and the contents of a DataSet might be updated often.

< Day Day Up >