Creative Commons License
本Blog采用 知识共享署名-非商业性使用-禁止演绎 3.0 Unported许可协议 进行许可。 —— Fox <游戏人生>

游戏人生

游戏人生 != ( 人生 == 游戏 )
站点迁移至:http://www.yulefox.com。请订阅本博的朋友将RSS修改为http://feeds.feedburner.com/yulefox
posts - 62, comments - 508, trackbacks - 0, articles - 7

IOCP使用时常见的几个错误

Posted on 2009-09-12 00:20 Fox 阅读(8682) 评论(9)  编辑 收藏 引用 所属分类: T技术碎语

本文同步自游戏人生

在使用IOCP时,最重要的几个API就是GetQueueCompeltionStatus、WSARecv、WSASend,数据的I/O及其完成状态通过这几个接口获取并进行后续处理。

GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.

BOOL WINAPI GetQueuedCompletionStatus(
  __in   HANDLE CompletionPort,
  __out  LPDWORD lpNumberOfBytes,
  __out  PULONG_PTR lpCompletionKey,
  __out  LPOVERLAPPED *lpOverlapped,
  __in   DWORD dwMilliseconds
);

If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfBytes, lpCompletionKey, and lpOverlapped parameters.

除了关心这个API的in & out(这是MSDN开头的几行就可以告诉我们的)之外,我们更加关心不同的return & out意味着什么,因为由于各种已知或未知的原因,我们的程序并不总是有正确的return & out。

If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.

假设我们指定dwMilliseconds为INFINITE。

这里常见的几个错误有:

WSA_OPERATION_ABORTED (995): Overlapped operation aborted.

由于线程退出或应用程序请求,已放弃I/O 操作。

MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.

成因分析:这个错误一般是由于peer socket被closesocket或者WSACleanup关闭后,针对这些socket的pending overlapped I/O operation被中止。

解决方案:针对socket,一般应该先调用shutdown禁止I/O操作后再调用closesocket关闭。

严重程度轻微易处理

WSAENOTSOCK (10038): Socket operation on nonsocket.

MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.

成因分析:在一个非套接字上尝试了一个操作。

使用closesocket关闭socket之后,针对该invalid socket的任何操作都会获得该错误。

解决方案:如果是多线程存在对同一socket的操作,要保证对socket的I/O操作逻辑上的顺序,做好socket的graceful disconnect。

严重程度轻微易处理

WSAECONNRESET (10054): Connection reset by peer.

远程主机强迫关闭了一个现有的连接。

MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

成因分析:在使用WSAAccpet、WSARecv、WSASend等接口时,如果peer application突然中止(原因如上所述),往其对应的socket上投递的operations将会失败。

解决方案:如果是对方主机或程序意外中止,那就只有各安天命了。但如果这程序是你写的,而你只是hard close,那就由不得别人了。至少,你要知道这样的错误已经出现了,就不要再费劲的继续投递或等待了。

严重程度轻微易处理

WSAECONNREFUSED (10061): Connection refused.

由于目标机器积极拒绝,无法连接。

MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.

成因分析:在使用connect或WSAConnect时,服务器没有运行或者服务器的监听队列已满;在使用WSAAccept时,客户端的连接请求被condition function拒绝。

解决方案:Call connect or WSAConnect again for the same socket. 等待服务器开启、监听空闲或查看被拒绝的原因。是不是长的丑或者钱没给够,要不就是服务器拒绝接受天价薪酬自主创业去了?

严重程度轻微易处理

WSAENOBUFS (10055): No buffer space available.

由于系统缓冲区空间不足或列队已满,不能执行套接字上的操作。

MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

成因分析:这个错误是我查看错误日志后,最在意的一个错误。因为服务器对于消息收发有明确限制,如果缓冲区不足应该早就处理了,不可能待到send/recv失败啊。而且这个错误在之前的版本中几乎没有出现过。这也是这篇文章的主要内容。像connect和accept因为缓冲区空间不足都可以理解,而且危险不高,但如果send/recv造成拥堵并恶性循环下去,麻烦就大了,至少说明之前的验证逻辑有疏漏。

WSASend失败的原因是:The Windows Sockets provider reports a buffer deadlock. 这里提到的是buffer deadlock,显然是由于多线程I/O投递不当引起的。

解决方案:在消息收发前,对最大挂起的消息总的数量和容量进行检验和控制。

严重程度严重

本文主要参考MSDN

************* 说明 *************

Fox只是对自己关心的几个错误和API参照MSDN进行分析,不提供额外帮助。

Feedback

# re: IOCP使用时常见的几个错误[未登录]  回复  更多评论   

2009-09-15 18:11 by foxriver
GetQueueCompeltionStatus 恐怖的一沓糊涂,运行一段时间就蹦出些莫名的错误,我这辈子是不会再用了。。。socket万岁。

BOOL bSuccess = GetQueuedCompletionStatus(m_hCompletionPort, &dwNumberBytes,
&CompletionKey, (LPOVERLAPPED*)&overlap, 10);//10ms for cpu eat.

time_check();

if (overlap) personal = overlap->content;

if (bSuccess == FALSE)
{
DWORD LastError = GetLastError();
if (LastError == WAIT_TIMEOUT)
continue;

// 2 - 系统找不到指定的文件。
// 121 - 信号灯超时时间已到。

// 1450 - 系统资源不足,无法完成请求的服务。
// 995 - 由于线程退出或应用程序请求,已放弃 I/O 操作。

// 64 - 指定的网络名不再可用。
// 10053 - 您的主机中的软件放弃了一个已建立的连接。
// 10054 - 远程主机强迫关闭了一个现有的连接
// 10058 - 由于以前的关闭调用,套接字在那个方向已经关闭,发送或接收数据的请求没有被接受。

// 0 - 操作成功完成。
// 997 - 重叠 I/O 操作在进行中。
// 998 - 内存分配访问无效。 (when fread() filesize > 3G)

# re: IOCP使用时常见的几个错误  回复  更多评论   

2011-04-15 11:47 by Nike Chaussure
给力。 我得多学点

# re: IOCP使用时常见的几个错误  回复  更多评论   

2011-04-19 10:36 by Nike Chaussure
给力。 我得多学点

# re: IOCP使用时常见的几个错误  回复  更多评论   

2011-07-02 19:16 by VeraDAY
Every one acknowledges that modern life seems to be not very cheap, however we need cash for different things and not every one gets enough cash. Thus to get some <a href="http://bestfinance-blog.com/topics/mortgage-loans">mortgage loans</a> and sba loan should be a right way out.

# re: IOCP使用时常见的几个错误  回复  更多评论   

2011-09-05 15:19 by rolex replicas
太感谢了~!

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理