Hadoop On Azure: FileNotFoundException in Hadoop Streaming

| Comments

The example description for Hadoop Streaming on Azure has some path typo error, hence you may struggle with following error:

1
2
3
4
5
6
>Exception in thread "main" java.io.FileNotFoundException: File hdfs://xxx.xxx.xxx.xxx:9000/example/apps/wc.exe does not exist.
 at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:390)
 at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:287)
 at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:413)
 at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:164)
 at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:147)

The problem is with the “HDFS://…” arguments.  The sample gives the path like below

1
>hdfs://xxx.xxx.xxx.xxx:9000/example/apps/wc.exe

This means the application is resided in example/apps.  When you access through HDFS URL, it starts from root directory, hence it founds any existence of “example” director in the root directory.  However, your actual directory in user/<your_user_name>, where “user” is in the root directory.

Hence, the HDFS URL should be hdfs://xxx.xxx.xxx.xxx:9000/user/user_name/example/apps/wc.exe

Also, another parameter -input “/example/data/davinci.txt” -output “/example/data/StreamingOutput/wc.txt” actually mentions the input data and output directory.  Here also the “example” directory starts from root directory.  Instead, it should be “example/data”, which resolves to “user/user_name/example/data”.