Forge Home

cloudera

Deploy Cloudera Manager and Cloudera's Distribution, including Apache Hadoop (CDH).

136,802 downloads

64,434 latest version

3.5 quality score

We run a couple of automated
scans to help you access a
module's quality. Each module is
given a score based on how well
the author has formatted their
code and documentation and
modules are also checked for
malware using VirusTotal.

Please note, the information below
is for guidance only and neither of
these methods should be considered
an endorsement by Puppet.

Version information

  • 3.1.0 (latest)
  • 3.0.0
  • 2.2.1
  • 2.2.0
  • 2.1.1
  • 2.1.0
  • 2.0.2
  • 2.0.1
  • 1.0.1
  • 1.0.0
  • 0.9.2
  • 0.8.0
  • 0.7.0
  • 0.6.3
  • 0.6.1
  • 0.5.0
released Feb 19th 2014

Start using this module

  • r10k or Code Manager
  • Bolt
  • Manual installation
  • Direct download

Add this module to your Puppetfile:

mod 'razorsedge-cloudera', '1.0.0'
Learn more about managing modules with a Puppetfile

Add this module to your Bolt project:

bolt module add razorsedge-cloudera
Learn more about using this module with an existing project

Manually install this module globally with Puppet module tool:

puppet module install razorsedge-cloudera --version 1.0.0

Direct download is not typically how you would use a Puppet module to manage your infrastructure, but you may want to download the module in order to inspect the code.

Download

Documentation

razorsedge/cloudera — version 1.0.0 Feb 19th 2014

Puppet Cloudera Manager and CDH4 Module

master branch: Build Status develop branch: Build Status

Introduction

This module manages the installation of Cloudera Manager. It follows the standards written in the Cloudera Manager Installation Guide Installation Path B - Installation Using Your Own Method. By default, this module assumes that parcels will be used to deploy CDH, Impala, and Search. If parcels are not desired, this module can also manage the installation of Cloudera's Distribution, including Apache Hadoop (CDH), Cloudera Impala, Cloudera Search, and LZO compression.

Actions:

  • Installs the Cloudera software repository for CM.
  • Installs Oracle Java Development Kit (JDK) 6.
  • Installs CM 4 agent.
  • Configures the CM agent to talk to a CM server.
  • Starts the CM agent.
  • Separately installs the CM server and database connectivity (by default to the embedded database server).
  • Separately starts the CM server.

Optional Actions (non-parcel):

  • Installs the Cloudera software repositories for CDH, Impala, and Search.
  • Installs most components of CDH 4.
  • Installs Impala 1.
  • Installs Search 1.
  • Optionally installs GPL Extras (LZO) 4.

Software Support:

  • Cloudera Manager - tested with 4.1.2 and 4.8.0
  • CDH - tested with 4.1.2 and 4.5.0
  • Cloudera Impala - tested with 1.0 and 1.2.3
  • Cloudera Search - tested with 1.1.0
  • Cloudera GPL Extras - tested with 4.3.0

OS Support:

Cloudera official supported operating systems.

  • RedHat family - tested on CentOS 6.4
  • SuSE family - tested on SLES 11SP1
  • Debian family - tested on Debian 6.0.7, Ubuntu 10.04.4 LTS, and Ubuntu 12.04.2 LTS

Class documentation is available via puppetdoc.

Class Descriptions

Class['cloudera']

Meta-class that includes:

  • Class['cloudera::cm::repo']
  • Class['cloudera::java']
  • Class['cloudera::cm']

Requires the parameter cm_server_host.

Class['cloudera::cm::repo']

This class handles installing the Cloudera Manager software repository.

Class['cloudera::java']

This class handles installing the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.

Class['cloudera::java::jce']

This class handles installing the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files. Set the parameter install_jce => true in Class['cloudera']. Manual setup is requied in order to download the required software from Oracle. See the files/README_JCE.md file for details.

Class['cloudera::cm']

This class handles installing and configuring the Cloudera Manager Agent. This agent should be running on every node in the cluster so that Cloudera Manager can deploy software configurations to the node. Requires the parameter server_host which is passed in from Class['cloudera'].

Class['cloudera::cm::server']

This class handles installing and configuring the Cloudera Manager Server. This class should only be included on one node of your environment. By default it will install the embeded PostgreSQL database on the same node. With the correct parameters, it can also connect to local or remote MySQL, PostgreSQL, and Oracle RDBMS databases.

Class['cloudera::cdh::repo']

This class handles installing the Cloudera Hadoop software repositories.

Class['cloudera::cdh']

This class handles installing the Cloudera Distribution, including Apache Hadoop. No configuration is performed on the CDH software and all daemons are forced off so that Cloudera Manager can manage them. This class installs Bigtop utils, Hadoop (HDFS, MapReduce, YARN), Hue-plugins, HBase, Hive, Oozie, Pig, ZooKeeper, and Flume-NG.

Class['cloudera::cdh::hue']

This class handles installing Hue. This class is not currently included in Class['cloudera::cdh'] as this would conflict with the Cloudera installation instructions.

Class['cloudera::impala::repo']

This class handles installing the Cloudera Impala software repositories.

Class['cloudera::impala']

This class handles installing Cloudera Impala. No configuration is performed on the Impala software and all daemons are forced off so that Cloudera Manager can manage them.

Class['cloudera::search::repo']

This class handles installing the Cloudera Search software repositories.

Class['cloudera::search']

This class handles installing Cloudera Search. No configuration is performed on the Search software and all daemons are forced off so that Cloudera Manager can manage them.

Class['cloudera::gplextras::repo']

This class handles installing the Cloudera GPL Extras software repositories.

Class['cloudera::gplextras']

This class handles installing Cloudera's GPL Extras (LZO compression libraries). No configuration is performed on any software.

Examples

Most nodes in the cluster will use this declaration:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
}

The node that will be the CM server may use this declaration:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
} ->
class { 'cloudera::cm::server': }

Parcels

Parcel is an alternative binary distribution format supported by Cloudera Manager 4.5+ that simplifies distribution of CDH and other Cloudera products. By default, this module assumes software deployment via parcel. To allow Cloudera Manager to install RPMs (or DEBs) instead of parcels, just set use_parcels => false.

Nodes that will be cluster members will use this declaration:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_parcels    => false,
}

Nodes that will be Gateways may use this declaration:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_parcels    => false,
}
class { 'cloudera::cdh::hue': }
class { 'cloudera::cdh::mahout': }
class { 'cloudera::cdh::sqoop': }
# Install Oozie WebUI support (optional):
#class { 'cloudera::cdh::oozie::ext': }
# Install MySQL support (optional):
#class { 'cloudera::cdh::hue::mysql': }
#class { 'cloudera::cdh::oozie::mysql': }

The node that will be the CM server may use this declaration: (This will skip installation of the CDH software as it is not required.)

class { 'cloudera::cm::repo':
  cm_version => '4.1',
} ->
class { 'cloudera::java': } ->
class { 'cloudera::java::jce': } ->
class { 'cloudera::cm': } ->
class { 'cloudera::cm::server': }

TLS

Level 1: Configuring TLS Encryption only for Cloudera Manager Level 2: Configuring TLS Authentication of Server to Agents and Users Level 3: Configuring TLS Authentication of Agents to Server

This module's deployment of TLS provides both level 1 and level 2 configuration (encryption and authentication of the server to the agents). Level 3 is presently much more difficult to implement. You will need to provide a TLS certificate and the signing certificate authority for the CM server. See the File resources in the below example for where the files need to be deployed.

There are some settings inside CM that can only be configured manually. See the Level 1 instructions for the details of what to change in the WebUI and use the below values:

Setting                       Value
Use TLS Encryption for Agents (check)
Path to TLS Keystore File     /etc/cloudera-scm-server/keystore
Keystore Password             The value of server_keypw in Class['cloudera::cm::server'].
Use TLS Encryption for        (check)
  Admin Console
# The node that will be the CM agent may use this declaration:
class { 'cloudera':
  server_host => 'smhost.example.com',
  use_tls     => true,
  install_jce => true,
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
# The node that will be the CM server may use this declaration:
class { 'cloudera':
  server_host => 'smhost.example.com',
  use_tls     => true,
  install_jce => true,
} ->
class { 'cloudera::cm::server':
  use_tls      => true,
  server_keypw => 'myPassWord',
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
file { '/etc/pki/tls/certs/cloudera_manager-ca.crt': }
file { "/etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt": }
file { "/etc/pki/tls/private/${::fqdn}-cloudera_manager.key": }

LZO Compression

LZO Compression libraries are available in the GPL Extras repository. To deploy the software on a non-parcel system just add use_gplextras => true to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality (ignore the mention of parcels in the link to the documentation).

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_parcels    => false,
  use_gplextras  => true,
}

Notes

  • Supports Top Scope variables (i.e. via Dashboard) and Parameterized Classes.
  • Installing CDH3 will not be supported.
  • Based on the Cloudera Manager 4.1 Installation Guide
  • TLS certificates must be in PEM format and are not deployed by this module.
  • When using parcels, the CDH software is not deployed by Puppet. Puppet will only install the Cloudera Manager server/agent. You must then configure Cloudera Manager to deploy the parcels.
  • When installing packages and not parcels on SLES, SP2 is required as the hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.sles11.x86_64 package requires netcat-openbsd which is not avalable on SLES 11SP1.
  • Osfamily RedHat 5 requires the EPEL YUM repository when installing LZO support.

Issues

  • Need external module support for the Oracle Instant Client JDBC.
  • When using an external PostgreSQL server that is on the same host as the CM server, PostgreSQL must be configured to accept connections with md5 password authentication.

TODO

See TODO.md for more items.

Deprecation Warning

The default for use_parcels will switch to true before the 1.0.0 release.

This:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
}

would become this:

class { 'cloudera':
  cm_server_host => 'smhost.example.com',
  use_parcels    => false,
}

The puppetlabs/mysql dependency will update to version 2. Make sure to review its changelog in the case of an upgrade.

The class cloudera::repo will be renamed to cloudera::cdh::repo and the Impala repository will be split out into cloudera::impala::repo.

This:

class { 'cloudera::repo':
  cdh_version => '4.1',
  cm_version  => '4.1',
}

would become this:

class { 'cloudera::cdh::repo':
  version => '4.1',
}
class { 'cloudera::impala::repo':
  version => '4.1',
}

Contributing

Please see DEVELOP.md for contribution information.

License

Please see LICENSE file.

Copyright

Copyright (C) 2013 Mike Arnold mike@razorsedge.org

razorsedge/puppet-cloudera on GitHub

razorsedge/cloudera on Puppet Forge