Internet Primer

You can't write a good Winsock program without understanding the concept of a socket, which is used to send and receive packets of data across the network. To fully understand sockets, you need a thorough knowledge of the underlying Internet protocols. This section contains a concentrated dose of Internet theory. It should be enough to get you going, but you might want to refer to one of the TCP/IP textbooks if you want more theory.

Network Protocols—Layering

All networks use layering for their transmission protocols, and the collection of layers is often called a stack. The application program talks to the top layer, and the bottom layer talks to the network. Figure 34-1 shows you the stack for a local area network (LAN) running TCP/IP. Each layer is logically connected to the corresponding layer at the other end of the communications channel. The server program, as shown at the right in Figure 34-1, continuously listens on one end of the channel, while the client program, as shown on the left, periodically connects with the server to exchange data. Think of the server as an HTTP-based World Wide Web server, and think of the client as a browser program running on your computer.

The Internet Protocol

The Internet Protocol (IP) layer is the best place to start in your quest to understand TCP/IP. The IP protocol defines packets called datagrams that are fundamental units of Internet communication. These packets, typically less than 1000 bytes in length, go bouncing all over the world when you open a Web page, download a file, or send e-mail. Figure 34-2 shows a simplified layout of an IP datagram.

Notice that the IP datagram contains 32-bit addresses for both the source and destination computers. These IP addresses uniquely identify computers on the Internet and are used by routers (specialized computers that act like telephone switches) to direct the individual datagrams to their destinations. The routers don't care about what's inside the datagrams—they're only interested in that datagram's destination address and total length. Their job is to resend the datagram as quickly as possible.

The IP layer doesn't tell the sending program whether a datagram has successfully reached its destination. That's a job for the next layer up the stack. The receiving program can look only at the checksum to determine whether the IP datagram header was corrupted.

The User Datagram Protocol

The TCP/IP protocol should really be called TCP/UDP/IP because it includes the User Datagram Protocol (UDP), which is a peer of TCP. All IP-based transport protocols store their own headers and data inside the IP data block. First let's look at the UDP layout in Figure 34-3.

Figure 34-4. The relationship between the IP datagram and the UDP datagram.

UDP is only a small step up from IP, but applications never use IP directly. Like IP, UDP doesn't tell the sender when the datagram has arrived. That's up to the application. The sender could, for example, require that the receiver send a response, and the sender could retransmit the datagram if the response didn't arrive within, say, 20 seconds. UDP is good for simple one-shot messages and is used by the Internet Domain Name System (DNS), which is explained later in this chapter. (UDP is used for transmitting live audio and video, for which some lost or out-of-sequence data is not a big problem.)

Figure 34-3 shows that the UDP header does convey some additional information—namely the source and destination port numbers. The application programs on each end use these 16-bit numbers. For example, a client program might send a datagram addressed to port 1700 on the server. The server program is listening for any datagram that includes 1700 in its destination port number, and when the server finds one, it can respond by sending another datagram back to the client, which is listening for a datagram that includes 1701 in its destination port number.

IP Address Format—Network Byte Order

You know that IP addresses are 32-bits long. You might think that 2³² (more than 4 billion) uniquely addressed computers could exist on the Internet, but that's not true. Part of the address identifies the LAN on which the host computer is located, and part of it identifies the host computer within the network. Most IP addresses are Class C addresses, which are formatted as shown in Figure 34-5.

This means that slightly more than 2 million networks can exist, and each of those networks can have 2⁸(256) addressable host computers. The Class A and Class B IP addresses, which allow more host computers on a network, are all used up.

The Internet "powers-that-be" have recognized the shortage of IP addresses, so they have proposed a new standard, the IP Next Generation (IPng) protocol. IPng defines a new IP datagram format that uses 128-bit addresses instead of 32-bit addresses. With IPng, you'll be able, for example, to assign a unique Internet address to each light switch in your house, so you can switch off your bedroom light from your portable computer from anywhere in the world. IPng implementation doesn't yet have a schedule.

By convention, IP addresses are written in dotted-decimal format. The four parts of the address refer to the individual byte values. An example of a Class C IP address is 194.128.198.201. In a computer with an Intel CPU, the address bytes are stored low-order-to-the-left, in so-called little-endian order. In most other computers, including the UNIX machines that first supported the Internet, bytes are stored high-order-to-the-left, in big-endian order. Because the Internet imposes a machine-independent standard for data interchange, all multibyte numbers must be transmitted in big-endian order. This means that programs running on Intel-based machines must convert between network byte order (big-endian) and host byte order (little-endian). This rule applies to 2-byte port numbers as well as to 4-byte IP addresses.

The Transmission Control Protocol

You've learned about the limitations of UDP. What you really need is a protocol that supports error-free transmission of large blocks of data. Obviously, you want the receiving program to be able to reassemble the bytes in the exact sequence in which they are transmitted, even though the individual datagrams might arrive in the wrong sequence. TCP is that protocol, and it's the principal transport protocol for all Internet applications, including HTTP and File Transfer Protocol (FTP). Figure 34-6 shows the layout of a TCP segment. (It's not called a datagram.) The TCP segment fits inside an IP datagram, as shown in Figure 34-7.

The TCP protocol establishes a full-duplex, point-to-point connection between two computers, and a program at each end of this connection uses its own port. The combination of an IP address and a port number is called a socket. The connection is first established with a three-way handshake. The initiating program sends a segment with the SYN flag set, the responding program sends a segment with both the SYN and ACK flags set, and then the initiating program sends a segment with the ACK flag set.

After the connection is established, each program can send a stream of bytes to the other program. TCP uses the sequence number fields together with ACK flags to control this flow of bytes. The sending program doesn't wait for each segment to be acknowledged but instead sends a number of segments together and then waits for the first acknowledgment. If the receiving program has data to send back to the sending program, it can piggyback its acknowledgment and outbound data together in the same segments.

The sending program's sequence numbers are not segment indexes but rather indexes into the byte stream. The receiving program sends back the sequence numbers (in the acknowledgment number field) to the sending program, thereby ensuring that all bytes are received and assembled in sequence. The sending program resends unacknowledged segments.

Each program closes its end of the TCP connection by sending a segment with the FIN flag set, which must be acknowledged by the program on the other end. A program can no longer receive bytes on a connection that has been closed by the program on the other end.

Don't worry about the complexity of the TCP protocol. The Winsock and WinInet APIs hide most of the details, so you don't have to worry about ACK flags and sequence numbers. Your program calls a function to transmit a block of data, and Windows takes care of splitting the block into segments and stuffing them inside IP datagrams. Windows also takes care of delivering the bytes on the receiving end, but that gets tricky, as you'll see later in this chapter.

The Domain Name System

When you surf the Web, you don't use IP addresses. Instead, you use human-friendly names like microsoft.com or www.cnn.com. A significant portion of Internet resources is consumed when host names (such as microsoft.com) are translated into IP addresses that TCP/IP can use. A distributed network of name server (domain server) computers performs this translation by processing DNS queries. The entire Internet namespace is organized into domains, starting with an unnamed root domain. Under the root is a series of top-level domains such as com, edu, gov, and org.

Do not confuse Internet domains with Microsoft Windows NT domains. The latter are logical groups of networked computers that share a common security database.

Let's look at the server end first. Suppose a company named SlowSoft has two host computers connected to the Internet, one for World Wide Web (WWW) service and the other for FTP service. By convention, these host computers are named www.slowsoft.com and ftp.slowsoft.com, respectively, and both are members of the second-level domain slowsoft, which SlowSoft has registered with an organization called InterNIC. (See http://www.internic.com/.)

Now SlowSoft must designate two (or more) host computers as its name servers. The name servers for the com domain each have a database entry for the slowsoft domain, and that entry contains the names and IP addresses of SlowSoft's two name servers. Each of the two slowsoft name servers has database entries for both of SlowSoft's host computers. These servers might also have database entries for hosts in other domains, and they might have entries for name servers in third-level domains. Thus, if a name server can't provide a host's IP address directly, it can redirect the query to a lower-level name server. Figure 34-8 illustrates SlowSoft's domain configuration.

A top-level name server runs on its own host computer. InterNIC manages (at last count) nine computers that serve the root domain and top-level domains. Lower-level name servers could be programs running on host computers anywhere on the Net. SlowSoft's Internet service provider (ISP), ExpensiveNet, can furnish one of SlowSoft's name servers. If the ISP is running Windows NT Server, the name server is usually the DNS program that comes bundled with the operating system. That name server might be designated ns1.expensivenet.com.

Now for the client side. A user types http://www.slowsoft.com in the browser. (The http:// prefix tells the browser to use the HTTP protocol when it eventually finds the host computer.) The browser must then resolve www.slowsoft.com into an IP address, so it uses TCP/IP to send a DNS query to the default gateway IP address for which TCP/IP is configured. This default gateway address identifies a local name server, which might have the needed host IP address in its cache. If not, the local name server relays the DNS query up to one of the root name servers. The root server looks up slowsoft in its database and sends the query back down to one of SlowSoft's designated name servers. In the process, the IP address for www.slowsoft.com will be cached for later use if it was not cached already. If you want to go the other way, name servers are also capable of converting an IP address to a name.

HTTP Basics

You're going to be doing some Winsock programming soon, but just sending raw byte streams back and forth isn't very interesting. You need to use a higher-level protocol in order to be compatible with existing Internet servers and browsers. HTTP is a good place to start because it's the protocol of the popular World Wide Web and it's relatively simple.

HTTP is built on TCP, and this is the way it works: First a server program listens on port 80. Then some client program (typically a browser) connects to the server (www.slowsoft.com, in this case) after receiving the server's IP address from a name server. Using its own port number, the client sets up a two-way TCP connection to the server. As soon as the connection is established, the client sends a request to the server, which might look like this:

The server identifies the request as a GET, the most common type, and it concludes that the client wants a file named newproducts.html that's located in a server directory known as /customers (which might or might not be \customers on the server's hard disk). Immediately following are request headers, which mostly describe the client's capabilities.

The If-Modified-Since header tells the server not to bother to transmit newproducts.html unless the file has been modified since March 26, 1997. This implies that the browser already has a dated copy of this file stored in its cache. The blank line at the end of the request is crucial; it provides the only way for the server to tell that it is time to stop receiving and start transmitting, and that's because the TCP connection stays open.

Now the server springs into action. It sends newproducts.html, but first it sends an OK response:

You're looking at elementary HyperText Markup Language (HTML) code here, and the resulting Web page won't win any prizes. We won't go into details because dozens of HTML books are already available. From these books, you'll learn that HTML tags are contained in angle brackets and that there's often an "end" tag (with a / character) for every "start" tag. Some tags, such as <a> (hypertext anchor), have attributes. In the example above, the line

creates a link to another HTML file. The user clicks on "SlowSoft's Home Page," and the browser requests default.htm from the original server.

Actually, newproducts.html references two server files, default.htm and /images/clouds.jpg. The clouds.jpg file is a JPEG file that contains a background picture for the page. The browser downloads each of these files as a separate transaction, establishing and closing a separate TCP connection each time. The server just dishes out files to any client that asks for them. In this case, the server doesn't know or care whether the same client requested newproducts.html and clouds.jpg. To the server, clients are simply IP addresses and port numbers. In fact, the port number is different for each request from a client. For example, if ten of your company's programmers are surfing the Web via your company's proxy server (more on proxy servers later), the server sees the same IP address for each client.

Web pages use two graphics formats, GIF and JPEG. GIF files are compressed images that retain all the detail of the original uncompressed image but are usually limited to 256 colors. They support transparent regions and animation. JPEG files are smaller, but they don't carry all the detail of the original file. GIF files are often used for small images such as buttons, and JPEG files are often used for photographic images for which detail is not critical. Visual C++ can read, write, and convert both GIF and JPEG files, but the Win32 API cannot handle these formats unless you supply a special compression/decompression module.

The HTTP standard includes a PUT request type that enables a client program to upload a file to the server. Client programs and server programs seldom implement PUT.

FTP Basics

The File Transfer Protocol handles the uploading and downloading of server files plus directory navigation and browsing. A Windows command-line program called ftp (it doesn't work through a Web proxy server) lets you connect to an FTP server using UNIX-like keyboard commands. Browser programs usually support the FTP protocol (for downloading files only) in a more user-friendly manner. You can protect an FTP server's directories with a user-name/password combination, but both strings are passed over the Internet as clear text. FTP is based on TCP. Two separate connections are established between the client and server, one for control and one for data.

Internet vs. Intranet

Up to now, we've been assuming that client and server computers were connected to the worldwide Internet. The fact is you can run exactly the same client and server software on a local intranet. An intranet is often implemented on a company's LAN and is used for distributed applications. Users see the familiar browser interface at their client computers, and server computers supply simple Web-like pages or do complex data processing in response to user input.

An intranet offers a lot of flexibility. If, for example, you know that all your computers are Intel-based, you can use ActiveX controls and ActiveX document servers that provide ActiveX document support. If necessary, your server and client computers can run custom TCP/IP software that allows communication beyond HTTP and FTP. To secure your company's data, you can separate your intranet completely from the Internet or you can connect it through a firewall, which is another name for a proxy server.