Just about all ofthe world’s HTTP communication is carried overTCP/IP,a popular layered set of packet-switched network protocols spoken bycomputers and network devices around the globe. A client applicationcan open a TCP/IP connection to a server application, running justabout anywhere in the world. Once the connection is established,messages exchanged between the client’s andserver’s computers will never be lost, damaged, orreceived out of order.[1]
Say you want the latest power tools price list fromJoe’s Hardware store:
http://www.joes-hardware.com:80/power-tools.html
When given this URL, your browser performs the steps shown in Figure 4-1. In Steps 1-3, the IP address and port numberof the server are pulled from the URL. A TCP connection is made tothe web server in Step 4, and a request message is sent across theconnection in Step 5. The response is read in Step 6, and theconnection is closed in Step 7.
HTTP connections really are nothingmore than TCP connections, plus a few rules about how to use them.TCP connections are the reliable connections of the Internet. To senddata accurately and quickly, you need to know the basics ofTCP.[2]
TCP gives HTTP a reliable bitpipe . Bytes stuffed in one side of a TCPconnection come out the other side correctly, and in the right order(see Figure 4-2).
Figure4-2.TCP carries HTTP data in order, and without corruption
TCP sends its data in little chunks called IPpackets (or IP datagrams).In this way, HTTP is the top layer in a"protocol stack” of“HTTP over TCP over IP,” asdepicted in Figure 4-3a. A secure variant,HTTPS, inserts acryptographic encryption layer (called TLS or SSL) between HTTP andTCP (Figure 4-3b).
When HTTP wants to transmit a message, it streams the contents of themessage data, in order, through an open TCP connection. TCP takes thestream of data, chops up the data stream into chunks calledsegments, andtransports the segments across the Internet inside envelopes calledIP packets (see Figure 4-4). This is all handled bythe TCP/IP software; the HTTP programmer sees none of it.
Each TCP segment is carried by an IP packet from one IP address to anotherIP address. Each of these IP packets contains:
An IP packet header (usually 20 bytes)
A TCP segment header (usually 20 bytes)
A chunk of TCP data (0 or more bytes)
The IP header contains the source and destination IP addresses, thesize, and other flags. The TCP segment header contains TCP portnumbers, TCP control flags, and numeric values used for data orderingand integrity checking.
A computer might have several TCP connections open at any one time. TCPkeeps all these connections straight through portnumbers .
Port numbers are like employees’ phone extensions.Just as a company’s main phone number gets you tothe front desk and the extension gets you to the right employee, theIP address gets you to the right computer and the port number getsyou to the right application. A TCP connection is distinguished by fourvalues:
<source-IP-address, source-port, destination-IP-address, destination-port>
Together, these four values uniquely define a connection. Twodifferent TCP connections are not allowed to have the same values forall four address components (but different connections can have thesame values for some of the components).
In Figure 4-5, there are four connections: A, B, Cand D. The relevant information for each port is listed in Table 4-1.
Table4-1.TCP connection values
Connection | Source IP address | Source port | Destination IP address | Destination port |
---|---|---|---|---|
A | 209.1.32.34 | 2034 | 204.62.128.58 | 4133 |
B | 209.1.32.35 | 3227 | 204.62.128.58 | 4140 |
C | 209.1.32.35 | 3105 | 207.25.71.25 | 80 |
D | 209.1.33.89 | 5100 | 207.25.71.25 | 80 |
Note that some of the connections share the same destination portnumber (C and D both have destination port 80). Some of theconnections have the same source IP address (B and C). Some have thesame destination IP address (A and B, and C and D). But no twodifferent connections share all four identical values.
Operating systems provide different facilities for manipulating theirTCP connections. Let’s take a quick look at one TCPprogramming interface, to make things concrete. Table 4-2 shows some of the primary interfaces providedby the sockets API. This sockets API hides all thedetails of TCP and IP from the HTTP programmer. The sockets API wasfirst developed for the Unix operating system, but variants are nowavailable for almost every operating system and language.
Table4-2.Common socket interface functions for programming TCP connections
Sockets API call | Description |
---|---|
s = socket(<parameters>) | Creates a new, unnamed, unattached socket. |
bind(s, <local IP:port>) | Assigns a local port number and interface to the socket. |
connect(s, <remote IP:port>) | Establishes a TCP connection to a local socket and a remote host andport. |
listen(s,...) | Marks a local socket as legal to accept connections. |
s2 = accept(s) | Waits for someone to establish a connection to a local port. |
n = read(s,buffer,n) | Tries to read n bytes from the socket into the buffer. |
n = write(s,buffer,n) | Tries to write n bytes from the buffer into the socket. |
close(s) | Completely closes the TCP connection. |
shutdown(s,<side>) | Closes just the input or the output of the TCP connection. |
getsockopt(s, . . . ) | Reads the value of an internal socket configuration option. |
setsockopt(s, . . . ) | Changes the value of an internal socket configuration option. |
The sockets API lets you create TCPendpoint data structures, connect these endpoints to remote serverTCP endpoints, and read and write data streams. The TCP API hides allthe details of the underlying network protocol handshaking and thesegmentation and reassembly of the TCP data stream to and from IPpackets.
In Figure 4-1, we showed how a web browser coulddownload the power-tools.html web page fromJoe’s Hardware store using HTTP. The pseudocode inFigure 4-6 sketches how we might use the socketsAPI to highlight the steps the client and server could perform toimplement this HTTP transaction.
We begin with the web server waiting for a connection (Figure 4-6, S4). The client determines the IP address andport number from the URL and proceeds to establish a TCP connectionto the server (Figure 4-6, C3). Establishing aconnection can take a while, depending on how far away the server is,the load on the server, and the congestion of the Internet.
Once the connection is set up, the client sends the HTTP request(Figure 4-6, C5) and the server reads it (Figure 4-6, S6). Once the server gets the entire requestmessage, it processes the request, performs the requested action(Figure 4-6, S7), and writes the data back to theclient. The client reads it (Figure 4-6, C6) andprocesses the response data (Figure 4-6, C7).
[1] Though messageswon’t be lost or corrupted, communication betweenclient and server can be severed if a computer or network breaks. Inthis case, the client and server are notified of the communicationbreakdown.
[2] If you are trying to write sophisticated HTTPapplications, and especially if you want them to be fast,you’ll want to learn a lot more about the internalsand performance of TCP than we discuss in this chapter. We recommendthe “TCP/IP Illustrated” books byW. Richard Stevens (Addison Wesley).