C++博客-&豪

[转] linux 常用定位问题命令总结

Sun, 21 Nov 2010 04:25:00 GMT

1：查看CPU负载--mpstat

mpstat -P ALL [internal [count]]

参数的含义如下：

-P ALL 表示监控所有CPU

internal 相邻的两次采样的间隔时间

count 采样的次数

mpstat命令从/proc/stat获得数据输出

输出的含义如下：

CPU 处理器ID

user 在internal时间段里，用户态的CPU时间（%），不包含 nice值为负进程 ?usr/?total*100

nice 在internal时间段里，nice值为负进程的CPU时间（%） ?nice/?total*100

system 在internal时间段里，核心时间（%） ?system/?total*100

iowait 在internal时间段里，硬盘IO等待时间（%） ?iowait/?total*100

irq 在internal时间段里，软中断时间（%） ?irq/?total*100

soft 在internal时间段里，软中断时间（%） ?softirq/?total*100

idle 在internal时间段里，CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间（%） ?idle/?total*100

intr/s 在internal时间段里，每秒CPU接收的中断的次数 ?intr/?total*100

CPU总的工作时间total_cur=user+system+nice+idle+iowait+irq+softirq

total_pre=pre_user+ pre_system+ pre_nice+ pre_idle+ pre_iowait+ pre_irq+ pre_softirq

user=user_cur – user_pre

total=total_cur-total_pre

其中_cur 表示当前值，_pre表示interval时间前的值。上表中的所有值可取到两位小数点。

2：查看磁盘io情况及CPU负载--vmstat

usage: vmstat [-V] [-n] [delay [count]]

-V prints version.

-n causes the headers not to be reprinted regularly.

-a print inactive/active page stats.

-d prints disk statistics

-D prints disk table

-p prints disk partition statistics

-s prints vm table

-m prints slabinfo

-S unit size

delay is the delay between updates in seconds.

unit size k:1000 K:1024 m:1000000 M:1048576 (default is K)

count is the number of updates.

vmstat从/proc/stat获得数据

输出的含义如下:

FIELD DESCRIPTION FOR VM MODE

Procs

r: The number of processes waiting for run time.

b: The number of processes in uninterruptible sleep.

Memory

swpd: the amount of virtual memory used.

free: the amount of idle memory.

buff: the amount of memory used as buffers.

cache: the amount of memory used as cache.

inact: the amount of inactive memory. (-a option)

active: the amount of active memory. (-a option)

Swap

si: Amount of memory swapped in from disk (/s).

so: Amount of memory swapped to disk (/s).

bi: Blocks received from a block device (blocks/s).

bo: Blocks sent to a block device (blocks/s).

System

in: The number of interrupts per second, including the clock.

cs: The number of context switches per second.

CPU

These are percentages of total CPU time.

us: Time spent running non-kernel code. (user time, including nice time)

sy: Time spent running kernel code. (system time)

id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.

wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

st: Time spent in involuntary wait. Prior to Linux 2.6.11, shown as zero.

3：查看内存使用情况--free

usage: free [-b|-k|-m|-g] [-l] [-o] [-t] [-s delay] [-c count] [-V]

-b,-k,-m,-g show output in bytes, KB, MB, or GB

-l show detailed low and high memory statistics

-o use old format (no -/+buffers/cache line)

-t display total for RAM + swap

-s update every [delay] seconds

-c update [count] times

-V display version information and exit

[root@Linux /tmp]# free

total used free shared buffers cached

Mem: 255268 238332 16936 0 85540 126384

-/+ buffers/cache: 26408 228860

Swap: 265000 0 265000

Mem：表示物理内存统计

-/+ buffers/cached：表示物理内存的缓存统计

Swap：表示硬盘上交换分区的使用情况，这里我们不去关心。

系统的总物理内存：255268Kb（256M），但系统当前真正可用的内存b并不是第一行free 标记的 16936Kb，它仅代表未被分配的内存。

第1行 Mem：

total：表示物理内存总量。

used：表示总计分配给缓存（包含buffers 与cache ）使用的数量，但其中可能部分缓存并未实际使用。

free：未被分配的内存。

shared：共享内存，一般系统不会用到，这里也不讨论。

buffers：系统分配但未被使用的buffers 数量。

cached：系统分配但未被使用的cache 数量。buffer 与cache 的区别见后面。

total = used + free

第2行 -/+ buffers/cached：

used：也就是第一行中的used - buffers-cached 也是实际使用的内存总量。

free：未被使用的buffers 与cache 和未被分配的内存之和，这就是系统当前实际可用内存。

free 2= buffers1 + cached1 + free1 //free2为第二行、buffers1等为第一行

buffer 与cache 的区别

A buffer is something that has yet to be "written" to disk.

A cache is something that has been "read" from the disk and stored for later use

第3行：

对操作系统来讲是Mem的参数.buffers/cached 都是属于被使用,所以它认为free只有16936.

对应用程序来讲是(-/+ buffers/cach).buffers/cached 是等同可用的，因为buffer/cached是为了提高文件读取的性能，当应用程序需在用到内存的时候，buffer/cached会很快地被回收。

所以从应用程序的角度来说，可用内存=系统free memory+buffers+cached.

swap

swap就是LINUX下的虚拟内存分区,它的作用是在物理内存使用完之后,将磁盘空间(也就是SWAP分区)虚拟成内存来使用.

4：查看网卡情况--sar

详细见man

4.1：查看网卡流量：sar -n DEV delay count

服务器网卡最大能承受流量由网卡本身决定，分为10M、10/100自适应、100+以及1G网卡，一般普通服务器用的是百兆，也有用千兆的。

输出解释：

IFACE

Name of the network interface for which statistics are reported.

rxpck/s

Total number of packets received per second.

txpck/s

Total number of packets transmitted per second.

rxbyt/s

Total number of bytes received per second.

txbyt/s

Total number of bytes transmitted per second.

rxcmp/s

Number of compressed packets received per second (for cslip etc.).

txcmp/s

Number of compressed packets transmitted per second.

rxmcst/s

Number of multicast packets received per second.

4.2：查看网卡失败情况：sar -n EDEV delay count

输出解释：

IFACE

Name of the network interface for which statistics are reported.

rxerr/s

Total number of bad packets received per second.

txerr/s

Total number of errors that happened per second while transmitting packets.

coll/s

Number of collisions that happened per second while transmitting packets.

rxdrop/s

Number of received packets dropped per second because of a lack of space in linux buffers.

txdrop/s

Number of transmitted packets dropped per second because of a lack of space in linux buffers.

txcarr/s

Number of carrier-errors that happened per second while transmitting packets.

rxfram/s

Number of frame alignment errors that happened per second on received packets.

rxfifo/s

Number of FIFO overrun errors that happened per second on received packets.

txfifo/s

Number of FIFO overrun errors that happened per second on transmitted packets.

5：定位问题进程--top, ps

top -d delay，详细见man

ps aux 查看进程详细信息

ps axf 查看进程树

6：查看某个进程与文件关系--losf

需要root权限才能看到全部，否则只能看到登录用户权限范围内的内容

lsof -p 77//查看进程号为77的进程打开了哪些文件

lsof -d 4//显示使用fd为4的进程

lsof abc.txt//显示开启文件abc.txt的进程

lsof -i :22//显示使用22端口的进程

lsof -i tcp//显示使用tcp协议的进程

lsof -i tcp:22//显示使用tcp协议的22端口的进程

lsof +d /tmp//显示目录/tmp下被进程打开的文件

lsof +D /tmp//同上，但是会搜索目录下的目录，时间较长

lsof -u username//显示所属user进程打开的文件

7：查看程序运行情况--strace

usage: strace [-dffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]

[-p pid] ... [-s strsize] [-u username] [-E var=val] ...

[command [arg ...]]

or: strace -c [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...

[command [arg ...]]

常用选项：

-f：除了跟踪当前进程外，还跟踪其子进程。

-c：统计每一系统调用的所执行的时间,次数和出错的次数等.

-o file：将输出信息写到文件file中，而不是显示到标准错误输出（stderr）。

-p pid：绑定到一个由pid对应的正在运行的进程。此参数常用来调试后台进程。

8：查看磁盘使用情况--df

test@wolf:~$ df

Filesystem 1K-blocks Used Available Use% Mounted on

/dev/sda1 3945128 1810428 1934292 49% /

udev 745568 80 745488 1% /dev

/dev/sda3 12649960 1169412 10837948 10% /usr/local

/dev/sda4 63991676 23179912 37561180 39% /data

9：查看网络连接情况--netstat

常用：netstat -lpn

选项说明：

-p, --programs display PID/Program name for sockets

-l, --listening display listening server sockets

-n, --numeric don't resolve names

-a, --all, --listening display all sockets (default: connected)

豪 2010-11-21 12:25 发表评论

A brief history of Consensus, 2PC and Transaction Commit.

Thu, 12 Aug 2010 15:37:00 GMT

Notes:
*. Time, Clocks and the Ordering of Events in a Distributed System" (1978)
1. The issue is that in a distributed system you cannot tell if event A happened before event B, unless A caused B in some way. Each observer can see events happen in a different order, except for events that cause each other, ie there is only a partial ordering of events in a distributed system.
    2. Lamport defines the "happens before" relationship and operator, and goes on to give an algorithm that provides a total ordering of events in a distributed system, so that each process sees events in the same order as every other process.
    3. Lamport also introduces the concept of a distributed state machine: start a set of deterministic state machines in the same state and then make sure they process the same messages in the same order.
    4. Each machine is now a replica of the others. The key problem is making each replica agree what is the next message to process: a consensus problem.
    5. However, the system is not fault tolerant; if one process fails that others have to wait for it to recover.

*. "Notes on Database Operating Systems" (1979).
    1. 2PC problem: Unfortunately 2PC would block if the TM (Transaction Manager) fails at the wrong time.

*. "NonBlocking Commit Protocols" (1981)
    1. 3PC problem: The problem was coming up with a nice 3PC algorithm, this would only take nearly 25 years!

*. "Impossibility of distributed consensus with one faulty process" (1985)
    1. this famous result is known as the "FLP" result
    2. By this time "consensus" was the name given to the problem of getting a bunch of processors to agree a value.
    3. The kernel of the problem is that you cannot tell the difference between a process that has stopped and one that is running very slowly, making dealing with faults in an asynchronous system almost impossible.
    4. a distributed algorithm has two properties: safety and liveness. 2PC is safe: no bad data is ever written to the databases, but its liveness properties aren't great: if the TM fails at the wrong point the system will block.
    5. The asynchronous case is more general than the synchronous case: an algorithm that works for an asynchronous system will also work for a synchronous system, but not vice versa.

*. "The Byzantine Generals Problem" (1982)
    1. In this form of the consensus problem the processes can lie, and they can actively try to deceive other processes.

*. "A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem." (1987) .
    1. At the time the best consensus algorithm was the Byzantine Generals, but this was too expensive to use for transactions.

*. "Uniform consensus is harder than consensus" (2000)
    1. With uniform consensus all processes must agree on a value, even the faulty ones - a transaction should only commit if all RMs are prepared to commit.

*. "The Part-Time Parliament" (submitted in 1990, published 1998)
    1. Paxos consensus algorithm

*. "How to Build a Highly Availability System using Consensus" (1996).
1. This paper provides a good introduction to building fault tolerant systems and Paxos.

*. "Paxos Made Simple (2001)
    1. The kernel of Paxos is that given a fixed number of processes, any majority of them must have at least one process in common. For example given three processes A, B and C the possible majorities are: AB, AC, or BC. If a decision is made when one majority is present eg AB, then at any time in the future when another majority is available at least one of the processes can remember what the previous majority decided. If the majority is AB then both processes will remember, if AC is present then A will remember and if BC is present then B will remember.
    2. Paxos can tolerate lost messages, delayed messages, repeated messages, and messages delivered out of order.
    3. It will reach consensus if there is a single leader for long enough that the leader can talk to a majority of processes twice. Any process, including leaders, can fail and restart; in fact all processes can fail at the same time, the algorithm is still safe. There can be more than one leader at a time.
    4. Paxos is an asynchronous algorithm; there are no explicit timeouts. However, it only reaches consensus when the system is behaving in a synchronous way, ie messages are delivered in a bounded period of time; otherwise it is safe. There is a pathological case where Paxos will not reach consensus, in accordance to FLP, but this scenario is relatively easy to avoid in practice.

*.   "Consensus in the presence of partial synchrony" (1988)
    1. There are two versions of partial synchronous system: in one processes run at speeds within a known range and messages are delivered in bounded time but the actual values are not known a priori; in the other version the range of speeds of the processes and the upper bound for message deliver are known a priori, but they will only start holding at some unknown time in the future.
    2. The partial synchronous model is a better model for the real world than either the synchronous or asynchronous model; networks function in a predicatable way most of the time, but occasionally go crazy.

*.   "Consensus on Transaction Commit" (2005).
    1. A third phase is only required if there is a fault, in accordance to the Skeen result. Given 2n+1 TM replicas Paxos Commit will complete with up to n faulty replicas.
    2. Paxos Commit does not use Paxos to solve the transaction commit problem directly, ie it is not used to solve uniform consensus, rather it is used to make the system fault tolerant.
    3. Recently there has been some discussion of the CAP conjecture: Consistency, Availability and Partition. The conjecture asserts that you cannot have all three in a distributed system: a system that is consistent, that can have faulty processes and that can handle a network partition.
    4. Now take a Paxos system with three nodes: A, B and C. We can reach consensus if two nodes are working, ie we can have consistency and availability. Now if C becomes partitioned and C is queried, it cannot respond because it cannot communicate with the other nodes; it doesn't know whether it has been partitioned, or if the other two nodes are down, or if the network is being very slow. The other two nodes can carry on, because they can talk to each other and they form a majority. So for the CAP conjecture, Paxos does not handle a partition because C cannot respond to queries. However, we could engineer our way around this. If we are inside a data center we can use two independent networks (Paxos doesn't mind if messages are repeated). If we are on the internet, then we could have our client query all nodes A, B and C, and if C is partitioned the client can query A or B unless it is partitioned in a similar way to C.
    5. a synchronous network, if C is partitioned it can learn that it is partitioned if it does not receive messages in a fixed period of time, and thus can declare itself down to the client.

*.   "Co-Allocation, Fault Tolerance and Grid Computing" (2006).

[REF] http://betathoughts.blogspot.com/2007/06/brief-history-of-consensus-2pc-and.html

豪 2010-08-12 23:37 发表评论

Lock-Free

Tue, 20 Jul 2010 08:58:00 GMT

A "wait-free" procedure can complete in a finite number of steps, regardless of the relative speeds of other threads.

A "lock-free" procedure guarantees progress of at least one of the threads executing the procedure. That means some threads can be delayed arbitrarily, but it is guaranteed that at least one thread makes progress at each step.

CAS：assuming the map hasn't changed since I last looked at it, copy it. Otherwise, start all over again.

Delay Update：In plain English, the loop says "I'll replace the old map with a new, updated one, and I'll be on the lookout for any other updates of the map, but I'll only do the replacement when the reference count of the existing map is one."

[REF]http://www.drdobbs.com/cpp/184401865

豪 2010-07-20 16:58 发表评论

Lessons Learned from scaling Farmville

Fri, 16 Jul 2010 07:06:00 GMT

1. Interactive games are write-heavy. Typical web apps read more than they write so many common architectures may not be sufficient. Read heavy apps can often get by with a caching layer in front of a single database. Write heavy apps will need to partition so writes are spread out and/or use an in-memory architecture.

2. Design every component as a degradable service. Isolate components so increased latencies in one area won't ruin another. Throttle usage to help alleviate problems. Turn off features when necessary.

3. Cache Facebook data. When you are deeply dependent on an external component consider caching that component's data to improve latency.

4. Plan ahead for new release related usage spikes.

5. Sample. When analyzing large streams of data, looking for problems for example, not every piece of data needs to be processed. Sampling data can yield the same results for much less work.

The key ideas are to isolate troubled and highly latent services from causing latency and performance issues elsewhere through use of error and timeout throttling, and if needed, disable functionality in the application using on/off switches and functionality based throttles.

豪 2010-07-16 15:06 发表评论

php copy on write

Tue, 18 May 2010 14:45:00 GMT

1.如果是非引用赋值，用于赋值的变量指向的zval的is_ref=0，则直接指向，refcount++；若zval的is_ref=1，则copy on write，原zval refcount不变, 新变量指向一个新的zval，is_ref=0, refcount=1;

2.如果是引用赋值，用于复制的变量指向的zval的is_ref=0，则copy on write，原zval refcount--，新变量和引用变量同时指向新的zval，is_ref=1,refcount=2; 若zval的is_ref=1，则直接指向,refcount++;

豪 2010-05-18 22:45 发表评论