2012年5月25日

Dynamic Programming 2: Numbers

1. 找出一个数组中，最大的一段连续的数的和。Find out the subarray which has the largest sum.
例如：[1, -3, 2, -4 , 5 , 6, -2, 6, 7] 最大的和就是 22 = 5 + 6 - 2 + 6 +7.
解法如下：

int subMax(int [] a)
{
    int best = 0;
    int sum = 0;
    for(int i = 0; i < a.length; i++)
    {
         sum = sum + a[i];
         if(sum < 0 )
             sum = 0;
         else if(sum > best)
             best = sum;
    }
    return best;
}

想法就是一直加接下来的数，如果小于零就变为0，大于最大的数就更新。其中一点就是，如果遇到负数，如果和不小于零就不用使sum为零。如果数组全部为负数，上面的代码有点问题，但不改了。如果想知道这个最大的和的序列是什么，只要稍微改变就可以了，不说了。

2. Ugly Number: 找出第n个能被2，3，5整除的数
例如：2, 3, 4, 5, 6, 9,10, 12, 15, 20, 25 ... 第3个是4，第4个是5，第5个是6 ... 第200是？
想法：首先是从 1开始，2，3，5分别乘1，最小的是2，接下来就是2，2的位置进1，3和5的位置不变再来一次，最小的是3，3的位置进1，2和5位置进1，再来一次，最小的是4，3和5的位置不变。。。

int uglyNum( int n)
{
   int a = new int[n+1]
   a[0] = 1;
   int i2 = 0, i3 = 0, i5 = 0;
   int n2 = 0; n3 = 0; n5 = 0;
   int m = 0;
   for(int i = 0; i <= n; i++)
   {
      n2 = a[i2] * 2;
      n3 = a[i3] * 3;
      n5 = a[i5] * 5;
      m = min(n2, n3, n5);
      if(m == n2)
      {
         a[i] = m;
         i2++;
      }
      //similar for i3 and i5
   }
   return a[n];
}

3. 最后一个问题：给 i, j 两个数，然后打印出 2^i ，5^j 的序列
例如： i = 3 j =4 就打印出：
2^0 * 5 ^0 = 1
2^1 * 5^0 = 2
2^2 * 5 ^0 = 4
2^0 * 5^1 = 5
2^3 * 5^0 = 8
2^1 * 5^1 = 10
...
解法：和上面一个解法很相似，不过注意要处理相等的情况，比如2 * 2^1 * 5 ^1 = 20 2^2 * 5^0 ^5 = 20, 代码就不写了。

posted @ 2012-05-25 12:06 MichaelCao| 编辑收藏

2012年5月23日

Dynamic Programming 1: Strings

什么是DP? 简单地来说就是把一个问题分解成两个或者多个小问题。然后先把小的问题解出来，最后利用已经得到的答案，把大的问题解决。
它和分而治之有点类似，但有所不同。DP所分解出来的小问题会相互依赖，因此就不知道从哪里分。而分而治之的小问题不相互依赖。先看
个小程序吧，生成第n个Fibnacci数，可能有人会这么写

int fib ( int n ) {
  if ( n == 0 )
     return 1;
  if ( n == 1 )
     return 1;
  return fib ( n - 1 ) + fib ( n - 2 );
}

但这个函数是2^n的递归，所以很快堆栈就会被用完的。另外如果思考一下，你会发现 fib( n - 1 ) 也已经要用到fib ( n - 2 ), 可是在
算fib ( n ) 的时候，这个值又要算一遍，那为什么不把这个值存下来呢？
好, 我们就换个DP的方式：

int fib ( int n ) {
   if ( n == 0 || n == 1 )
      return 1;
   int [] f = new int[ n ];
   f[ 0 ] = 1;
   f[ 1 ] = 1;
   for( int i = 2; i < n; i++)
   {
        f[ i ] = f[ i-1 ] + f[ i-2 ];
   }
   return f[ n-1 ];
}

可能这个比较容易了。大家都明白，就是先把以前的值给算好，然后后面的计算就可以利用前面的值。嗯，那稍微换个难点的吧。给一个n*n的0,1 matrix，然后找到最大的全是1的submatrix的大小。比如：
00011
01111
11110
01110
这个最大的那个全是1的submatrix的大小就是3.看起来挺难，其实蛮容易的。
我们先用最平常的思路来解一下吧。
先初始化另外一个同样大小的n*n的matrix
第一行和第一列很容易，和原先一样的值
00011
0
0
1
0
接下来，算第二行，和其他的行。自己动手，你就知道其实就是
s[i][j] = min(s[i][j-1],s[i-1][j],s[i-1][j-1]) + 1
我们顺便还可以加上一个max,记录最大的值。
这样这个就搞定了。DP介绍完毕。接下来开始关于String的DP

1.找到两个字符串的最大相同字串的长度
例如：abaabb aabbaa 最大的相同字串aabb长度就是4.
解法：给两个串 p,q 我们有
c(i,j) = 0 if p[i] != q[j]
c(i,j) = c(i-1,j-1) + 1 if p[i] = q[j].
代码和上面submatrix很相似。先初始化边缘，然后算出其他的值
2.找到两个字符串的最大subsequence的长度
例如：acbbab abbca 最大的subsequence is abba 长度是4.
解法：给两个串 p,q 我们有
c(i,j) = max(c(i-1,j),c(i,j-1)) if p[i] != q[j]
c(i,j) = c(i-1,j-1) + 1 if p[i] = q[j]
3.找到一个字符串最大的Palindrom
例如: abcdedcbdsa 最大的Palindrom就是bcdedcb 长度是7
解法：给一个串p
c(i,j) = max(c(i+1,j),c(i,j-1)) if p[i] != q[j]
c(i,j) = c(i+1,j-1) + 2 if p[i] = q[j]

posted @ 2012-05-23 22:03 MichaelCao| 编辑收藏

2010年11月15日

Use the rsync to back up all files

Here is the scripts:

rsync --progress --avze ssh --delete srcDir/ remoteName@remoteMachine:remoteDir/

-a quick way to specify the recursion and preserve everything.
-v verbose
-z compress

And then change ssh with no password:
Create the private keys for local machine:

ssh-keygen -t dsa

Copy the local keys to remote machine:

ssh-copy-id -i ~/.ssh/id_dsa.pub remoteuser@remotebox

Do not set the password.

After that, add an alias into your .bashrc
alias bp='. ~/backup.sh'
So you can run bp directly to backup all things.
Over ~~~

posted @ 2010-11-15 14:22 MichaelCao 阅读(274) | 评论 (0) | 编辑收藏

2009年11月5日

如何使用ramdisk in linux

摘要: 阅读全文

posted @ 2009-11-05 13:26 MichaelCao 阅读(533) | 评论 (0) | 编辑收藏

2009年11月3日

如何处理hadoop 无法启动

http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
很好的一篇文章，尤其是里面的禁用ipv6

posted @ 2009-11-03 19:38 MichaelCao 阅读(380) | 评论 (0) | 编辑收藏

2009年10月20日

如何在ubuntu中使用ssh

摘要: ssh ubuntu 阅读全文

posted @ 2009-10-20 10:07 MichaelCao 阅读(294) | 评论 (0) | 编辑收藏

2009年10月7日

Signal And Await In java

摘要: MemoryFileSystem Java signal await 阅读全文

posted @ 2009-10-07 14:31 MichaelCao 阅读(640) | 评论 (0) | 编辑收藏

2009年10月6日

如何在eclipse中设置java heap size

摘要: eclipse java heap size 阅读全文

posted @ 2009-10-06 10:31 MichaelCao 阅读(3188) | 评论 (0) | 编辑收藏

2009年9月29日

如何在本机eclipse下调试Hadoop.

摘要: Hadoop eclipse 安装阅读全文

posted @ 2009-09-29 21:21 MichaelCao 阅读(1899) | 评论 (0) | 编辑收藏

2009年9月28日

Hadoop 纠错一

在编译Hadoop后需要注意的几点
1.各个节点上的版本要相同。需要注意的是，编译后，程序运行就会从build文件夹里的class文件运行，所以即使你把jar包cp到根目录底下，仍然没用。删除build目录里面的jar包，还是没用。办法: ant clean
2.在使用一个新版本以后，可能会出现mapreduce能启动，但是dfs无法启动的情况。即使你format namenode还是不行。这个让我郁闷了好久。mapreduce 都可以启动，但就是dfs无法启动。datanode就是启动不了。想了好久，总算想明白了。因为datanode里面有数据，可是namenode里面却格式化了。办法：删除所有datanode中的数据。

使用ssh 远程执行命令
ssh gp09@***.comp.nus.edu.sg 'mkdir hadoop'
不过ssh有一个比较烦的地方，就是不能用cd命令。所以在使用的时候要小心。

在linux或者unix中安装ant
编辑.bashrc 文件
添加：
export ANT_HOME=~/files/....
export JAVA_HOME=/usr/lib/jvm/java-6-sun/jre/bin/java
export PATH=$(PATH):$(ANT_HOME)/bin
期中$表示提取变量，:表示在后面添加。

posted @ 2009-09-28 13:32 MichaelCao 阅读(365) | 评论 (0) | 编辑收藏

2008年12月10日

关于自旋锁和信号量的纠正

这个是在是应该纠正一下.因为以前什么都不知道.恩,看完linux 0.11的源代码后,顺便又看了Robert Love写的Linux Development,这里还是先推荐一下这本书吧.首先作者是大牛.不信的话,打开linux的2.6内核源代码,然后找sche.c.我想应该就能发现他的大名了.实在是令我崇拜阿.然后内容写的,整体来说还不错.尤其是前面那一部分.对于内核调度以及中断之类的分析.写的很好.后面的话,恩,个人觉得就有点不如前面的,思考的少了一点,应用多了一点.对于内核讲的就少了.而如何写驱动之类就多了.但不管怎么样,这本书真的是一本很不错的书.有看过linux 0.11源代码并且喜欢内核的可以看看.
废话不多说了.首先从自旋锁的来源来看吧.说到这个就要说SMP,linux 在2.2的内核之后就加入了SMP的支持.一直到2.6越来越好.有SMP就有多个cpu的队列.每一个cpu都有一个自己的调度队列.这样在有些时候就需要平衡这些队列.这个时候就要用到锁,让其他cpu什么也不做.让一个cpu来更新这些队列.这个时候肯定是不能用信号量的(?).这样就出现了自旋锁.当然自旋锁的用途不止这里.比如说在中断中,进入临界区.信号量也是不能用的(?).这个时候就要用自旋锁,其他方面的话,我再回去看看.这样的话应该就很清楚了.信号量只是在进程中使用的.一般来说,应用级程序,你根本不用考虑自旋锁.没有SMP,也不用考虑了.因为代码编译以后只是禁止了内核抢占.这也就是说,这段代码不会被抢占,sleep什么的根本没用.如果是开发驱动方面的话,这个在必要的时候还是应该考虑一下.什么是必要的时候呢?就是上面我说的,进入中断临界区且有多个cpu.

posted @ 2008-12-10 20:49 MichaelCao 阅读(2097) | 评论 (2) | 编辑收藏

2008年9月30日

函数指针的强制类型转换

指针应该都是4个字节,指向32位的地址.可以寻访4GB的内存.如果是64位就再说.所以对函数指针来说这个应该就有了很大的好处.因为指针大家都是4个字节不论是什么种类的函数,它肯定都是4字节.这样赋值就没问题.在这里你也可以将指针直接看成是一个整数.这样会更明白些.而对于另外一个问题.函数参数和返回值,则完全由函数的定义来决定.嗯.这样就可以有很大的自由空间.来段代码.

#include<iostream>
2

using namespace std ;
3

typedef void (*pfn) (void);
5

union msg
6

{
7

pfn first ;
8

int (* ifn)(int a ,int b );
9

void(*vfn)(int ,int );
10

};
11

int OnInt(int a ,int b )
12

{
13

cout<<a<<" "<<b<<endl;
14

return a ;
15

}
16

void OnVoid(int a ,int b )
17

{
18

cout<<a <<" "<<b<<endl;
19

}
20

int main()
21

{
22

pfn p=(pfn)(int (*)(int ,int ))OnInt;
23

msg m;
24

m.first=p;
25

cout<<(m.ifn)(5,6)<<endl;
26

p=(pfn)(void (*)(int, int ))OnVoid;
28

m.first=p;
29

m.vfn(10,15);
30

return 0;
31

}

看了这段代码会让人想到什么呢?想到的应该是MFC中那些消息函数吧.不同的消息,参数不一样,返回值也不一样.而在定义的时候只是一个指针,可是在调用的时候却有各种各样的方式.另外这段代码最有意思的就是打破常规,就用了union同时只有一个变量在起作用,平时书上总是说其他变量都不能用,今天就用给你看看,用的还很牛...

posted @ 2008-09-30 11:26 MichaelCao 阅读(5671) | 评论 (0) | 编辑收藏

2008年7月6日

关于子进程和父进程--fork函数

我总算有点眉目了.原来在fork()之后系统就有两个一样的进程了.以前一直晕,两个一样的进程?那有什么用啊?其实是fork()这个函数会返回两次而已.对于子进程,得到的是0,而对于父进程,得到却是子进程的pid,这样根据得到不同的pid,然后两个进程就可以进行不一样的运行了.并且子进程继承了父进程的数据段,代码段,这个也就是说变量阿还是有的,代码阿还是会运行的.
贴点代码稍稍解释一下:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>

int main(void)
{
        pid_t pid=fork();
        if(pid==0)
        {
                int j ;
                for(j=0;j<10;j++)
                {
                        printf("child: %d\n",j);
                        sleep(1);
                }
        }
        else if (pid>0)
        {
                int i;
                for(i=0;i<10;i++)
                {
                        printf("parent: %d\n",i);
                        sleep(1);
                }
        }
        else
        {
                fprintf(stderr,"can't fork ,error %d\n",errno);
                exit(1);
        }
        printf("This is the end !");
}

运行了这段代码,我想应该所有人都应该了解fork了吧.运行的时候可以查看进程(ps -aux),会发现有两个一样的进程,运行结束后最后一句printf会运行两次,因为每个进程都会运行一次.中间的交替就是进程的调度了.我也是刚刚明白,还有很多东西要深刻理解.总算有点眉目了.很爽.

posted @ 2008-07-06 23:40 MichaelCao 阅读(9388) | 评论 (8) | 编辑收藏

2008年4月30日

用代码来思考自旋锁和信号量

觉得昨天的思考似乎还是不怎么过瘾,有些问题还不是很清楚.到底应用方面两个有什么区别呢?
自己学着别人小小的动了下手.
先贴信号量的代码.

#include<pthread.h>
#include<stdio.h>
#include<sys/time.h>

#define MAX 10
pthread_t thread[2];
pthread_mutex_t mut;
int number=0,i;

void * thread1()
{
   printf("thread1: I'm thread 1 \n");
   for(i =0;i<MAX ;i++)
   {
       printf("thread 1: number=%d \n",number);
       pthread_mutex_lock(&mut);
       number++;
       pthread_mutex_unlock(&mut);
       sleep(2);
   }
   printf("thread1: 主函数在等我完成任务吗？\n");
   pthread_exit(NULL);
}
void * thread2()
{
   printf("thread2: I'm thread 2 \n");
   for(i =0; i<MAX;i++)
   {
       printf("thread2 : number=%d\n",number);
       pthread_mutex_lock(&mut);
       number++;
       pthread_mutex_unlock(&mut);
       sleep(3);
   }
   printf("thread2 : 主函数在等我完成任务么？\n");
   pthread_exit(NULL);

}

void thread_create(void)
{
   /*创建线程*/
   pthread_create(&thread[0],NULL,thread1,NULL);
   printf("线程1被创建！\n");
   pthread_create(&thread[1],NULL,thread2,NULL);
   printf("线程2被创建！\n");
}
void thread_wait(void)
{
   /*等待线程结束*/
   pthread_join(thread[0],NULL);
   printf("线程1已经结束！\n");
   pthread_join(thread[1],NULL);
   printf("线程2已经结束!\n");
}
int main()
{
   /*用默认属性初始化互斥锁*/
   pthread_mutex_init(&mut,NULL);
   printf("我是主函数，我正在创建线程！\n");
   thread_create();
   printf("我是主函数，我正在等待线程完成任务！\n");
   thread_wait();
}

执行的结果是:

我是主函数，我正在创建线程！
thread1: I'm thread 1
thread 1: number=0
线程1被创建！
thread2: I'm thread 2
thread2 : number=1
线程2被创建！
我是主函数，我正在等待线程完成任务！
thread 1: number=2
thread2 : number=3
thread 1: number=4
thread 1: number=5
thread2 : number=6
thread 1: number=7
thread2 : number=8
thread 1: number=9
thread2 : number=10
thread1: 主函数在等我完成任务吗？
线程1已经结束！
thread2 : 主函数在等我完成任务么？
线程2已经结束!

重要:这个执行的过程大概要10秒!!!!!!
而我们用自旋锁,代码:

/*
* time :2008.4.30
* author:will cao
* Email:sei_michael@126.com
* 探索自旋锁与信号量的区别
*/
#include<pthread.h>
#include<stdio.h>

pthread_t thread[2];
pthread_spinlock_t lock ;

#define MAX 10

int number=0,i;

void * thread1()
{
   printf ("thread 1 :I began to run !");
   for(i=0;i<MAX;i++)
   {
       printf("thread 1 :number=%d \n",number);
       pthread_spin_lock(&lock);
       number++;
       pthread_spin_unlock(&lock);
   }
   printf("ok ,I am over !\n");
   pthread_exit(NULL);
}
void * thread2 ()
{
   printf("thread2 : I start !!!\n");
   for(i=0;i<MAX;i++)
   {
       printf("thread2 : number = %d \n",number);
       pthread_spin_lock(&lock);
       number++;
       pthread_spin_unlock(&lock);
   }
   printf("thread 2: I am over!!!");
   pthread_exit(NULL);
}

void thread_create(void)
{
   /*create the threads */
   pthread_create(&thread[0],NULL,thread1,NULL);
   printf("create the thread 1\n ");
   pthread_create(&thread[1],NULL,thread2,NULL);
   printf("create the thread 2 \n");
}
void thread_wait(void )
{
   /*wait for the thread to be over */
   pthread_join(thread[0],NULL);
   printf("the thread 1 is over !\n");
   pthread_join(thread[1],NULL);
   printf("the thread 2 is over ! \n");
}
int main()
{
   /* init the spin lock */
   pthread_spin_init(&lock,0);
   printf("i am the main,and I am creating the threads ");
   thread_create();
   printf("i am the main,and I am wait for the thread to be over!");
   thread_wait();
}

执行结果为:

i am the main,and I am creating the threads thread 1 :I began to run !thread 1 :number=0
thread 1 :number=1
thread 1 :number=2
thread 1 :number=3
thread 1 :number=4
thread 1 :number=5
thread 1 :number=6
thread 1 :number=7
thread 1 :number=8
thread 1 :number=9
ok ,I am over !
create the thread 1
thread2 : I start !!!
create the thread 2
i am the main,and I am wait for the thread to be over!thread2 : number = 10
thread2 : number = 11
thread2 : number = 12
thread2 : number = 13
thread2 : number = 14
thread2 : number = 15
thread2 : number = 16
thread2 : number = 17
thread2 : number = 18
thread2 : number = 19
thread 2: I am over!!!the thread 1 is over !
the thread 2 is over !

执行时间:我没用系统调用,但肯定是用不了0.1秒的...
总结:从表面上来看,很明显的区别是当我们用的是信号量的时候,这个时候是有调度的.因为从运行结果上来看,主线程在创建其他两个线程后,其他线程开始运行.并且主线程也在运行.但怎么运行这个是无法确定的,这是一个并发的过程.
当使用自旋锁后,这个就不一样了.当运行到临界区的时候,它是直接的过去,不是会产生一个等待,或者一个调度.
不知道编译器是怎么编译的.很想知道编译后二进制代码有什么区别.但这个好像有点太难....不过我觉得从运行结果上来看这么多,应该差不多了.

posted @ 2008-04-30 16:45 MichaelCao 阅读(977) | 评论 (4) | 编辑收藏

信号量与自旋锁

   刚刚开始想这个问题的时候,觉得好像这个根本就不是一个问题.学操作系统的进程间的通信时,就是先说用互斥锁解决两个进程同时访问临界区的方法.但是后来Dijkstra对于哲学家进餐的问题的解答使用了信号量,于是我们接受了信号量.在看pthread的时候,发现还有个自旋锁.于是有点晕,这两个不都是控制对临界区的访问的么?怎么都上来了?他们之间有什么区别,他们又都是怎么实现的?
   首先说自旋锁.这个实现基本上是和TSL相同.TSL指令,首先是要共享一个lock,当进入临界区时,首先将lock复制到寄存器,然后将lock置为1,接下来看寄存器中的值是否为0,为0进入.不为0返回.而最重要的是它能保证指令执行的不可分割性,也就是说在这条指令结束之前,其他指令不允许访问内存.实现的是方式是在指令执行之前将内存总线禁止.结束后在打开内存总线.而自旋锁实现就是这个样子.只不过多循环了几次.为了更好的让cpu调度,在尝试一定次数后返回.因为他是一直在那边循环所以叫做自旋锁.可见这种锁很耗资源.但是速度上来说很快.一旦锁释放,立刻可以得到资源.
   再来看看信号量,信号量的实现就不这般精准了.如果使用一个信号量来控制一个临界区的话.就会有很多情况,首先最明显的是读者-写者问题.可以有多个读者,写者只可以有一个.并且信号量的实现也和自旋锁有者一定的区别.当一个信号量不能访问后.进程不会在那里循环,会被睡眠掉.当信号量可以使用的时候,调度器会从可以调度的进程选择一个.
   基本上就这个样子.

posted @ 2008-04-30 01:32 MichaelCao| 编辑收藏

2008年4月28日

小斋寄语

回首,大学三年已经过去,人生最精华的部分也在渐渐的流逝.时常想到什么,学到什么总会在这写写,在那画画,虽然历经许多,但不成系统,总之似乎想抓住些什么,但总有滑过,故在此建一小屋.
望多思,多想,多记.珍惜青春.

posted @ 2008-04-28 00:13 MichaelCao 阅读(263) | 评论 (0) | 编辑收藏

仅列出标题

非澹泊无以明志,非宁静无以致远

随笔分类

文章分类

随笔档案

最新随笔

最新评论

News

常用链接

我参与的团队

评论排行榜

阅读排行榜