Oozie 出现 ClassNotFoundException 解决方法

如果出现以下错误:

  java.lang.ClassNotFoundException: org.apache.hadoop.tools.DistCp
  java.lang.NoClassDefFoundError: org/apache/pig/Main
  java.lang.ClassNotFoundException: org.apache.pig.Main
  java.lang.NoClassDefFoundError: org/apache/sqoop/Sqoop
  java.lang.ClassNotFoundException: org.apache.sqoop.Sqoop
  java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
  java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
  java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found

怎么解决呢?当然是加入对应的包啦~~

运行Oozie的job,Oozie要怎么找到所需要的类呢?

有两种方式来添加jar包。

先看下Oozie workflow的目录。

$ sudo -u oozie hadoop fs -ls examples/apps/demo/
Found 4 items
-rw-r--r--   1 oozie supergroup 930  2012-12-14 13:23 examples/apps/demo/id.pig
-rw-r--r--   1 oozie supergroup 1020 2012-12-14 13:23 examples/apps/demo/job.properties
drwxr-xr-x   - oozie supergroup 0    2012-12-14 13:23 examples/apps/demo/lib
-rw-r--r--   1 oozie supergroup 6136 2012-12-14 13:23 examples/apps/demo/workflow.xml

其中workflow.xml和job.properties是必须要有的,id.pig则是跟job有关的pig脚本,这里可以无视。

关键是lib文件夹,这个文件夹可有可无,如果有,oozie则会自动把lib下面的jar包加到workflow的classpath里面去。

这样通过oozie运行的任务就能找到jar包啦~

以上这种方式是最简单的方式。

另外,我们也可以在job.properties文件里面使用oozie.libpath来指定其它的HDFS目录(可以指定多个目录,通过逗号分隔)。

通过oozie.libpath来指定jar包目录的好处就是,多个workflow可以共用jar包。

 

现在说第二种方式,通过ShareLib。

像 DistCp, Streaming, Pig, Sqoop, and Hive之类Action,需要额外的Jar包才能运行。

使用ShareLib的方式很像oozie.libpath,不一样的是,它是为上面说的那些特殊的action和他们对应的JARs而准备的。

CDH 4.1.2的ShareLib目录如下:

  drwxr-xr-x share 
  drwxr-xr-x share/lib
  drwxr-xr-x share/lib/distcp
  -rw-r--r-- share/lib/distcp/hadoop-tools-2.0.0-mr1-cdh4.1.2.jar
  drwxr-xr-x share/lib/hive
  -rw-r--r-- share/lib/hive/JavaEWAH-0.3.2.jar
  -rw-r--r-- share/lib/hive/antlr-2.7.7.jar
  -rw-r--r-- share/lib/hive/antlr-3.0.1.jar
  -rw-r--r-- share/lib/hive/antlr-runtime-3.0.1.jar
  -rw-r--r-- share/lib/hive/avro-ipc-1.7.1.cloudera.2.jar
  -rw-r--r-- share/lib/hive/avro-mapred-1.7.1.cloudera.2.jar
  -rw-r--r-- share/lib/hive/commons-beanutils-1.7.0.jar
  -rw-r--r-- share/lib/hive/commons-beanutils-core-1.8.0.jar
  -rw-r--r-- share/lib/hive/commons-collections-3.2.1.jar
  -rw-r--r-- share/lib/hive/commons-compress-1.4.1.jar
  -rw-r--r-- share/lib/hive/commons-configuration-1.6.jar
  -rw-r--r-- share/lib/hive/commons-dbcp-1.4.jar
  -rw-r--r-- share/lib/hive/commons-digester-1.8.jar
  -rw-r--r-- share/lib/hive/commons-pool-1.5.4.jar
  -rw-r--r-- share/lib/hive/datanucleus-connectionpool-2.0.3.jar
  -rw-r--r-- share/lib/hive/datanucleus-core-2.0.3.jar
  -rw-r--r-- share/lib/hive/datanucleus-enhancer-2.0.3.jar
  -rw-r--r-- share/lib/hive/datanucleus-rdbms-2.0.3.jar
  -rw-r--r-- share/lib/hive/derby-10.6.1.0.jar
  -rw-r--r-- share/lib/hive/guava-11.0.2.jar
  -rw-r--r-- share/lib/hive/haivvreo-1.0.7-cdh-4.jar
  -rw-r--r-- share/lib/hive/hive-builtins-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-cli-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-common-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-contrib-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-exec-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-metastore-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-serde-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-service-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/hive-shims-0.9.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/hive/httpclient-4.0.1.jar
  -rw-r--r-- share/lib/hive/httpcore-4.0.1.jar
  -rw-r--r-- share/lib/hive/jackson-core-asl-1.8.8.jar
  -rw-r--r-- share/lib/hive/jackson-mapper-asl-1.8.8.jar
  -rw-r--r-- share/lib/hive/jdo2-api-2.3-ec.jar
  -rw-r--r-- share/lib/hive/jetty-util-6.1.26.cloudera.2.jar
  -rw-r--r-- share/lib/hive/jline-0.9.94.jar
  -rw-r--r-- share/lib/hive/json-20090211.jar
  -rw-r--r-- share/lib/hive/jsr305-1.3.9.jar
  -rw-r--r-- share/lib/hive/jta-1.1.jar
  -rw-r--r-- share/lib/hive/libfb303-0.7.0.jar
  -rw-r--r-- share/lib/hive/libthrift-0.7.0.jar
  -rw-r--r-- share/lib/hive/netty-3.4.0.Final.jar
  -rw-r--r-- share/lib/hive/servlet-api-2.5-20081211.jar
  -rw-r--r-- share/lib/hive/stringtemplate-3.1-b1.jar
  -rw-r--r-- share/lib/hive/xz-1.0.jar
  drwxr-xr-x share/lib/mapreduce-streaming
  -rw-r--r-- share/lib/mapreduce-streaming/commons-cli-1.2.jar
  -rw-r--r-- share/lib/mapreduce-streaming/commons-codec-1.4.jar
  -rw-r--r-- share/lib/mapreduce-streaming/commons-el-1.0.jar
  -rw-r--r-- share/lib/mapreduce-streaming/commons-httpclient-3.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/commons-logging-1.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/commons-net-3.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/core-3.1.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/hadoop-core-2.0.0-mr1-cdh4.1.2.jar
  -rw-r--r-- share/lib/mapreduce-streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar
  -rw-r--r-- share/lib/mapreduce-streaming/hsqldb-1.8.0.7.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jackson-core-asl-1.8.8.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jackson-mapper-asl-1.8.8.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jasper-compiler-5.5.23.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jasper-runtime-5.5.23.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jets3t-0.6.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jetty-6.1.14.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jetty-util-6.1.26.cloudera.2.jar
  -rw-r--r-- share/lib/mapreduce-streaming/jsp-api-2.1.jar
  -rw-r--r-- share/lib/mapreduce-streaming/log4j-1.2.16.jar
  -rw-r--r-- share/lib/mapreduce-streaming/oro-2.0.8.jar
  -rw-r--r-- share/lib/mapreduce-streaming/servlet-api-2.5-6.1.14.jar
  -rw-r--r-- share/lib/mapreduce-streaming/servlet-api-2.5.jar
  -rw-r--r-- share/lib/mapreduce-streaming/xmlenc-0.52.jar
  drwxr-xr-x share/lib/oozie
  -rw-r--r-- share/lib/oozie/json-simple-1.1.jar
  drwxr-xr-x share/lib/pig
  -rw-r--r-- share/lib/pig/activation-1.1.jar
  -rw-r--r-- share/lib/pig/antlr-2.7.7.jar
  -rw-r--r-- share/lib/pig/antlr-runtime-3.4.jar
  -rw-r--r-- share/lib/pig/commons-beanutils-1.7.0.jar
  -rw-r--r-- share/lib/pig/commons-beanutils-core-1.8.0.jar
  -rw-r--r-- share/lib/pig/commons-collections-3.2.1.jar
  -rw-r--r-- share/lib/pig/commons-configuration-1.6.jar
  -rw-r--r-- share/lib/pig/commons-digester-1.8.jar
  -rw-r--r-- share/lib/pig/commons-io-2.1.jar
  -rw-r--r-- share/lib/pig/guava-11.0.2.jar
  -rw-r--r-- share/lib/pig/hbase-0.92.1-cdh4.1.2.jar
  -rw-r--r-- share/lib/pig/high-scale-lib-1.1.1.jar
  -rw-r--r-- share/lib/pig/hsqldb-1.8.0.7.jar
  -rw-r--r-- share/lib/pig/httpclient-4.0.1.jar
  -rw-r--r-- share/lib/pig/httpcore-4.0.1.jar
  -rw-r--r-- share/lib/pig/jaxb-api-2.2.2.jar
  -rw-r--r-- share/lib/pig/jline-0.9.94.jar
  -rw-r--r-- share/lib/pig/joda-time-1.6.jar
  -rw-r--r-- share/lib/pig/jruby-complete-1.6.5.jar
  -rw-r--r-- share/lib/pig/jsch-0.1.42.jar
  -rw-r--r-- share/lib/pig/jsr305-1.3.9.jar
  -rw-r--r-- share/lib/pig/jython-2.5.0.jar
  -rw-r--r-- share/lib/pig/libthrift-0.7.0.jar
  -rw-r--r-- share/lib/pig/metrics-core-2.1.2.jar
  -rw-r--r-- share/lib/pig/pig-0.10.0-cdh4.1.2.jar
  -rw-r--r-- share/lib/pig/protobuf-java-2.4.0a.jar
  -rw-r--r-- share/lib/pig/stax-api-1.0.1.jar
  -rw-r--r-- share/lib/pig/stringtemplate-3.2.1.jar
  -rw-r--r-- share/lib/sharelib.properties
  drwxr-xr-x share/lib/sqoop
  -rw-r--r-- share/lib/sqoop/avro-ipc-1.7.1.cloudera.2.jar
  -rw-r--r-- share/lib/sqoop/avro-mapred-1.7.1.cloudera.2.jar
  -rw-r--r-- share/lib/sqoop/commons-beanutils-1.7.0.jar
  -rw-r--r-- share/lib/sqoop/commons-beanutils-core-1.8.0.jar
  -rw-r--r-- share/lib/sqoop/commons-configuration-1.6.jar
  -rw-r--r-- share/lib/sqoop/commons-digester-1.8.jar
  -rw-r--r-- share/lib/sqoop/commons-io-2.1.jar
  -rw-r--r-- share/lib/sqoop/guava-11.0.2.jar
  -rw-r--r-- share/lib/sqoop/hbase-0.92.1-cdh4.1.2.jar
  -rw-r--r-- share/lib/sqoop/high-scale-lib-1.1.1.jar
  -rw-r--r-- share/lib/sqoop/hsqldb-1.8.0.7.jar
  -rw-r--r-- share/lib/sqoop/httpclient-4.0.1.jar
  -rw-r--r-- share/lib/sqoop/httpcore-4.0.1.jar
  -rw-r--r-- share/lib/sqoop/jsr305-1.3.9.jar
  -rw-r--r-- share/lib/sqoop/libthrift-0.7.0.jar
  -rw-r--r-- share/lib/sqoop/metrics-core-2.1.2.jar
  -rw-r--r-- share/lib/sqoop/netty-3.4.0.Final.jar
  -rw-r--r-- share/lib/sqoop/servlet-api-2.5-20081211.jar
  -rw-r--r-- share/lib/sqoop/sqoop-1.4.1-cdh4.1.2.jar

就如你所看到的,上面那些action各自依赖很多jar,每个action都有自己对应的文件夹,这样oozie就可以仅仅使用对应acion所需要的jar包,而不是把所有的jar包都包含进去。

其实,这么做是必须的。因为不是所有的action使用相同甚至是兼容的包。比如Hive action使用antlr-runtime-3.0.1.jar, 而用antlr-runtime-3.4.jar的时候将会运行失败,但是后者却是Pig action所需要的。

默认情况下,ShareLib必须放在HDFS上运行Oozie web server用户的目录下。不需要和提交oozie job的用户相同。对了,sharelib的压缩文件就在oozie的目录下~

在oozie-site.xml里面的oozie.service.WorkflowAppService.system.libpath可以指定ShareLib的位置,默认是/user/${user.name}/share/lib,其中${user.name}就是运行oozie服务的用户。详细的配置可以参考http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_17_6.html

对了,因为cloudera的CDH4支持MRv1和YARN,所以必须用对应的sharelib包才行。

要让一个workflow使用ShareLib,只需要在job.properties里面加入oozie.use.system.libpath=true就行啦~。

Overriding the ShareLib

这个暂时没用到,就不翻译了。

In CDH 4.1.0 and later (or Oozie 3.3.0 and later), you can override the ShareLib location at the action, job, and server levels. This allows users or admins to support multiple versions or a patched version of an action at the same time. The property is called oozie.action.sharelib.for.actiontype, where actiontype is the name of the action type (e.g. Pig, Sqoop); you would set its value to the name of a subfolder in the ShareLib. To set it at the action level you would put the property in that action’s <configuration>; to set it at the job level, you would put the property in that job’s job.properties; and to set it at the server level, you would put the property in oozie-site.xml.

For example, Oozie currently ships ready for Pig 0.10.x, but suppose you also want to be able to use Pig 0.9.x in the same workflow. The share/lib/pig folder is for Pig 0.10.x, but if you add a new folder with the Pig 0.9.x JARs, say share/lib/pig-9, you can put the following in the <configuration> element for the Pig 0.9.x action:

<property>
   <name>oozie.action.sharelib.for.pig</name>
    <value>pig-9<value>
 </property>

Oozie will continue to use share/lib/pig for the Pig 0.10.x action but will use share/lib/pig-9 for the Pig 0.9.x action.

参考:http://blog.cloudera.com/blog/2012/12/how-to-use-the-sharelib-in-apache-oozie/

转载请注明: 转载自http://jyd.me/

本文链接地址: Oozie 出现 ClassNotFoundException 解决方法

Tagged on: ,

发表评论

电子邮件地址不会被公开。 必填项已用*标注