我要啦免费统计

from http://docs.continuum.io/anaconda-cluster/examples/spark-caffe

Deep Learning (Spark, Caffe, GPU)

Description

To demonstrate the capability of running a distributed job in PySpark using a GPU, this example uses a neural network library, Caffe. Below is a trivial example of using Caffe on a Spark cluster; although this is redundant, it demonstrates the capability of training neural networks with GPUs.

For this example, we recommend the use of the AMI ami-2cbf3e44 and the instance type g2.2xlarge. An example profile (to be placed in ~/.acluster/profiles.d/gpu_profile.yaml) is shown below:

name: gpu_profile
node_id: ami-2cbf3e44 # Ubuntu 14.04 - IS HVM - Cuda 6.5
user: ubuntu
node_type: g2.2xlarge
num_nodes: 3
provider: aws
plugins:
  - spark-yarn
  - notebook

Download

To execute this example, download the: spark-caffe.py example script or spark-caffe.ipynbexample notebook.

Installation

The Spark + YARN plugin can be installed on the cluster using the following command:

$ acluster install spark-yarn

Once the Spark + YARN plugin is installed, you can view the YARN UI in your browser using the following command:

$ acluster open yarn

Dependencies

First, we need to bootstrap Caffe and its dependencies on all of the nodes. We provide a bash script that will install Caffe from source: bootstrap-caffe.sh. The following command can be used to upload the bootstrap-caffe.sh script to all of the nodes and execute it in parallel:

$ acluster submit bootstrap-caffe.sh --all

After a few minues, Caffe and its dependencies will be installed on the cluster nodes and the job can be started.

Running the Job

Here is the complete script to run the Spark + GPU with Caffe example in PySpark:

# spark-caffe.py from pyspark import SparkConf from pyspark import SparkContext  conf = SparkConf() conf.setMaster('yarn-client') conf.setAppName('spark-caffe') sc = SparkContext(conf=conf)   def noop(x):     import socket     return socket.gethostname()  rdd = sc.parallelize(range(2), 2) hosts = rdd.map(noop).distinct().collect() print hosts   def caffe_process(x):     import os     os.environ['PATH'] = '/usr/local/cuda/bin' + ':' + os.environ['PATH']     os.environ['LD_LIBRARY_PATH'] = '/usr/local/cuda/lib64:/home/ubuntu/pombredanne-https-gitorious.org-mdb-mdb.git-9cc04f604f80/libraries/liblmdb'     import subprocess     proc = subprocess.Popen('cd /home/ubuntu/caffe && bash ./examples/mnist/train_lenet.sh', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)     out, err = proc.communicate()     return proc.returncode, out, err  rdd = sc.parallelize(range(2), 2) ret = rdd.map(caffe_process).distinct().collect() print ret 

You can submit the script to the Spark cluster using the submit command.

$ acluster submit spark-caffe.py 

After the script completes, the trained Caffe model can be found at/home/ubuntu/caffe/examples/mnist/lenet_iter_10000.caffemodel on all of the compute nodes.

posted on 2015-10-14 17:25 阅读(3527) 评论(1)  编辑 收藏 引用 所属分类: life关于人工智能的yy

评论:
# re: Deep Learning (Spark, Caffe, GPU) 2015-10-21 18:19 | 春秋十二月
这是啥  回复  更多评论
  

只有注册用户登录后才能发表评论。
网站导航: 博客园   IT新闻   BlogJava   知识库   博问   管理