低调做技术__欢迎移步我的独立博客 codemaro.com 微博 kevinlynx

TCP/IP Concepts 1



TCP segment:
Thus, we have simply “passed the buck” to TCP, which must take the stream from the application
and divide it into discrete messages for IP. These messages are called TCP segments.

On regular intervals, it forms segments to be transmitted using IP. The size of the segment is
controlled by two primary factors. The first issue is that there is an overall limit to the size
of a segment, chosen to prevent unnecessary fragmentation at the IP layer. This is governed by a
parameter called the maximum segment size (MSS), which is determined during connection establishment.
The second is that TCP is designed so that once a connection is set up, each of the devices tells the
other how much data it is ready to accept at any given time. If this is lower than the MSS value, a
smaller segment must be sent. This is part of the sliding window system described in the next topic.

Since TCP works with individual bytes of data rather than discrete messages, it must use an
identification scheme that works at the byte level to implement its data transmission and tracking
system. This is accomplished by assigning each byte TCP processes a sequence number.

Since applications send data to TCP as a stream of bytes and not prepackaged messages, each
application must use its own scheme to determine where one application data element ends and the
next begins.



In addition to the dictates of the current window size, each TCP device also has associated
with it a ceiling on TCP size—a segment size that will never be exceeded regardless of how
 large the current window is. This is called the maximum segment size (MSS). When deciding
how much data to put into a segment, each device in the TCP connection will choose the amount
 based on the current window size, in conjunction with the various algorithms described in
the reliability section, but it will never be so large that the amount of data exceeds the
 MSS of the device to which it is sending.

Note: I need to point out that the name “maximum segment size” is in fact misleading. The
 value actually refers to the maximum amount of data that a segment can hold—it does not
include the TCP headers. So if the MSS is 100, the actual maximum segment size could be 120
(for a regular TCP header) or larger (if the segment includes TCP options).

This was computed by starting with the minimum MTU for IP networks of 576.

Devices can indicate that they wish to use a different MSS value from the default by including
a Maximum Segment Size option in the SYN message they use to establish a connection. Each
device in the connection may use a different MSS value.


delayed ACK algorithm


In a simpleminded implementation of TCP, every data packet that comes in is immediately acknowledged
 with an ACK packet. (ACKs help to provide the reliability TCP promises.)

In modern stacks, ACKs are delayed for a short time (up to 200ms, typically) for three reasons: a)
to avoid the silly window syndrome; b) to allow ACKs to piggyback on a reply frame if one is ready
to go when the stack decides to do the ACK; and c) to allow the stack to send one ACK for several
frames, if those frames arrive within the delay period.

The stack is only allowed to delay ACKs for up to 2 frames of data.


Nagle algorithm:

Nagle's algorithm, named after John Nagle, is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network.

Nagle's document, Congestion Control in IP/TCP Internetworks (RFC896) describes what he called the 'small packet problem', where an application repeatedly emits data in small chunks, frequently only 1 byte in size. Since TCP packets have a 40 byte header (20 bytes for TCP, 20 bytes for IPv4), this results in a 41 byte packet for 1 byte of useful information, a huge overhead. This situation often occurs in Telnet sessions, where most keypresses generate a single byte of data which is transmitted immediately. Worse, over slow links, many such packets can be in transit at the same time, potentially leading to congestion collapse.

Nagle's algorithm works by coalescing a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, so that output can be sent all at once.

[edit] Algorithm
if there is new data to send
  if the window size >= MSS and available data is >= MSS
    send complete MSS segment now
    if there is unconfirmed data still in the pipe
      enqueue data in the buffer until an acknowledge is received
      send data immediately
    end if
  end if
end if
where MSS = Maximum segment size

This algorithm interacts badly with TCP delayed acknowledgments, a feature introduced into TCP at roughly the same time in the early 1980s, but by a different group. With both algorithms enabled, applications which do two successive writes to a TCP connection, followed by a read, experience a constant delay of up to 500 milliseconds, the "ACK delay". For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option. The first major application to run into this problem was the X Window System.

The tinygram problem and silly window syndrome are sometimes confused. The tinygram problem occurs when the window is almost empty. Silly window syndrome occurs when the window is almost full

3.17 - What is the Nagle algorithm?
The Nagle algorithm is an optimization to TCP that makes the stack wait until all data is acknowledged on the connection before it sends more data. The exception is that Nagle will not cause the stack to wait for an ACK if it has enough enqueued data that it can fill a network frame. (Without this exception, the Nagle algorithm would effectively disable TCP's sliding window algorithm.) For a full description of the Nagle algorithm, see RFC 896.

So, you ask, what's the purpose of the Nagle algorithm?

The ideal case in networking is that each program always sends a full frame of data with each call to send(). That maximizes the percentage of useful program data in a packet.

The basic TCP and IPv4 headers are 20 bytes each. The worst case protocol overhead percentage, therefore, is 40/41, or 98%. Since the maximum amount of data in an Ethernet frame is 1500 bytes, the best case protocol overhead percentage is 40/1500, less than 3%.

While the Nagle algorithm is causing the stack to wait for data to be ACKed by the remote peer, the local program can make more calls to send(). Because TCP is a stream protocol, it can coalesce the data in those send() calls into a single TCP packet, increasing the percentage of useful data.

Imagine a simple Telnet program: the bulk of a Telnet conversation consists of sending one character, and receiving an echo of that character back from the remote host. Without the Nagle algorithm, this results in TCP's worst case: one byte of user data wrapped in dozens of bytes of protocol overhead. With the Nagle algorithm enabled, the TCP stack won't send that one Telnet character out until the previous characters have all been acknowledged. By then, the user may well have typed another character or two, reducing the relative protocol overhead.

This simple optimization interacts with other features of the TCP protocol suite, too:

Most stacks implement the delayed ACK algorithm: this causes the remote stack to delay ACKs under certain circumstances, which allows the local stack a bit of time to "Nagle" some more bytes into a single packet.

The Nagle algorithm tends to improve the percentage of useful data in packets more on slow networks than on fast networks, because ACKs take longer to come back.

TCP allows an ACK packet to also contain data. If the local stack decides it needs to send out an ACK packet and the Nagle algorithm has caused data to build up in the output buffer, the enqueued data will go out along with the ACK packet.
The Nagle algorithm is on by default in Winsock, but it can be turned off on a per-socket basis with the TCP_NODELAY option of setsockopt(). This option should not be turned off except in a very few situations.

Beware of depending on the Nagle algorithm too heavily. send() is a kernel function, so every call to send() takes much more time than for a regular function call. Your application should coalesce its own data as much as is practical to minimize the number of calls to send().


Sliding Window Acknowledgment System :
A basic technique for ensuring reliability in communications uses a rule that requires a
device to send back an acknowledgment each time it successfully receives a transmission.
If a transmission is not acknowledged after a period of time, it is retransmitted by its
sender. This system is called positive acknowledgment with retransmission (PAR). One
 drawback with this basic scheme is that the transmitter cannot send a second message
until the first has been acknowledged.


The sliding window serves several purposes:
(1) it guarantees the reliable delivery of data
(2) it ensures that the data is delivered in order,
(3) it enforces flow control between the sender and the receiver.

------------------to be continued

posted on 2008-04-18 10:02 Kevin Lynx 阅读(2285) 评论(1)  编辑 收藏 引用 所属分类: game develop通用编程


# re: TCP/IP Concepts 1[未登录] 2008-04-18 10:05 cppexplore

果然够草  回复  更多评论   

【推荐】超50万行VC++源码: 大型组态工控、电力仿真CAD与GIS源码库
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理