cloudera
Version information
Start using this module
Add this module to your Puppetfile:
mod 'razorsedge-cloudera', '1.0.0'
Learn more about managing modules with a PuppetfileDocumentation
Puppet Cloudera Manager and CDH4 Module
master branch: develop branch:
Introduction
This module manages the installation of Cloudera Manager. It follows the standards written in the Cloudera Manager Installation Guide Installation Path B - Installation Using Your Own Method. By default, this module assumes that parcels will be used to deploy CDH, Impala, and Search. If parcels are not desired, this module can also manage the installation of Cloudera's Distribution, including Apache Hadoop (CDH), Cloudera Impala, Cloudera Search, and LZO compression.
Actions:
- Installs the Cloudera software repository for CM.
- Installs Oracle Java Development Kit (JDK) 6.
- Installs CM 4 agent.
- Configures the CM agent to talk to a CM server.
- Starts the CM agent.
- Separately installs the CM server and database connectivity (by default to the embedded database server).
- Separately starts the CM server.
Optional Actions (non-parcel):
- Installs the Cloudera software repositories for CDH, Impala, and Search.
- Installs most components of CDH 4.
- Installs Impala 1.
- Installs Search 1.
- Optionally installs GPL Extras (LZO) 4.
Software Support:
- Cloudera Manager - tested with 4.1.2 and 4.8.0
- CDH - tested with 4.1.2 and 4.5.0
- Cloudera Impala - tested with 1.0 and 1.2.3
- Cloudera Search - tested with 1.1.0
- Cloudera GPL Extras - tested with 4.3.0
OS Support:
Cloudera official supported operating systems.
- RedHat family - tested on CentOS 6.4
- SuSE family - tested on SLES 11SP1
- Debian family - tested on Debian 6.0.7, Ubuntu 10.04.4 LTS, and Ubuntu 12.04.2 LTS
Class documentation is available via puppetdoc.
Class Descriptions
Class['cloudera']
Meta-class that includes:
- Class['cloudera::cm::repo']
- Class['cloudera::java']
- Class['cloudera::cm']
Requires the parameter cm_server_host
.
Class['cloudera::cm::repo']
This class handles installing the Cloudera Manager software repository.
Class['cloudera::java']
This class handles installing the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.
Class['cloudera::java::jce']
This class handles installing the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files. Set the parameter install_jce => true
in Class['cloudera']
. Manual setup is requied in order to download the required software from Oracle. See the files/README_JCE.md file for details.
Class['cloudera::cm']
This class handles installing and configuring the Cloudera Manager Agent. This agent should be running on every node in the cluster so that Cloudera Manager can deploy software configurations to the node. Requires the parameter server_host
which is passed in from Class['cloudera'].
Class['cloudera::cm::server']
This class handles installing and configuring the Cloudera Manager Server. This class should only be included on one node of your environment. By default it will install the embeded PostgreSQL database on the same node. With the correct parameters, it can also connect to local or remote MySQL, PostgreSQL, and Oracle RDBMS databases.
Class['cloudera::cdh::repo']
This class handles installing the Cloudera Hadoop software repositories.
Class['cloudera::cdh']
This class handles installing the Cloudera Distribution, including Apache Hadoop. No configuration is performed on the CDH software and all daemons are forced off so that Cloudera Manager can manage them. This class installs Bigtop utils, Hadoop (HDFS, MapReduce, YARN), Hue-plugins, HBase, Hive, Oozie, Pig, ZooKeeper, and Flume-NG.
Class['cloudera::cdh::hue']
This class handles installing Hue. This class is not currently included in Class['cloudera::cdh'] as this would conflict with the Cloudera installation instructions.
Class['cloudera::impala::repo']
This class handles installing the Cloudera Impala software repositories.
Class['cloudera::impala']
This class handles installing Cloudera Impala. No configuration is performed on the Impala software and all daemons are forced off so that Cloudera Manager can manage them.
Class['cloudera::search::repo']
This class handles installing the Cloudera Search software repositories.
Class['cloudera::search']
This class handles installing Cloudera Search. No configuration is performed on the Search software and all daemons are forced off so that Cloudera Manager can manage them.
Class['cloudera::gplextras::repo']
This class handles installing the Cloudera GPL Extras software repositories.
Class['cloudera::gplextras']
This class handles installing Cloudera's GPL Extras (LZO compression libraries). No configuration is performed on any software.
Examples
Most nodes in the cluster will use this declaration:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
}
The node that will be the CM server may use this declaration:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
} ->
class { 'cloudera::cm::server': }
Parcels
Parcel is an alternative binary distribution format supported by Cloudera Manager 4.5+ that simplifies distribution of CDH and other Cloudera products. By default, this module assumes software deployment via parcel. To allow Cloudera Manager to install RPMs (or DEBs) instead of parcels, just set use_parcels => false
.
Nodes that will be cluster members will use this declaration:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
use_parcels => false,
}
Nodes that will be Gateways may use this declaration:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
use_parcels => false,
}
class { 'cloudera::cdh::hue': }
class { 'cloudera::cdh::mahout': }
class { 'cloudera::cdh::sqoop': }
# Install Oozie WebUI support (optional):
#class { 'cloudera::cdh::oozie::ext': }
# Install MySQL support (optional):
#class { 'cloudera::cdh::hue::mysql': }
#class { 'cloudera::cdh::oozie::mysql': }
The node that will be the CM server may use this declaration: (This will skip installation of the CDH software as it is not required.)
class { 'cloudera::cm::repo':
cm_version => '4.1',
} ->
class { 'cloudera::java': } ->
class { 'cloudera::java::jce': } ->
class { 'cloudera::cm': } ->
class { 'cloudera::cm::server': }
TLS
Level 1: Configuring TLS Encryption only for Cloudera Manager Level 2: Configuring TLS Authentication of Server to Agents and Users Level 3: Configuring TLS Authentication of Agents to Server
This module's deployment of TLS provides both level 1 and level 2 configuration (encryption and authentication of the server to the agents). Level 3 is presently much more difficult to implement. You will need to provide a TLS certificate and the signing certificate authority for the CM server. See the File resources in the below example for where the files need to be deployed.
There are some settings inside CM that can only be configured manually. See the Level 1 instructions for the details of what to change in the WebUI and use the below values:
Setting Value
Use TLS Encryption for Agents (check)
Path to TLS Keystore File /etc/cloudera-scm-server/keystore
Keystore Password The value of server_keypw in Class['cloudera::cm::server'].
Use TLS Encryption for (check)
Admin Console
# The node that will be the CM agent may use this declaration:
class { 'cloudera':
server_host => 'smhost.example.com',
use_tls => true,
install_jce => true,
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
# The node that will be the CM server may use this declaration:
class { 'cloudera':
server_host => 'smhost.example.com',
use_tls => true,
install_jce => true,
} ->
class { 'cloudera::cm::server':
use_tls => true,
server_keypw => 'myPassWord',
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
file { '/etc/pki/tls/certs/cloudera_manager-ca.crt': }
file { "/etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt": }
file { "/etc/pki/tls/private/${::fqdn}-cloudera_manager.key": }
LZO Compression
LZO Compression libraries are available in the GPL Extras repository. To deploy the software on a non-parcel system just add use_gplextras => true
to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality (ignore the mention of parcels in the link to the documentation).
class { 'cloudera':
cm_server_host => 'smhost.example.com',
use_parcels => false,
use_gplextras => true,
}
Notes
- Supports Top Scope variables (i.e. via Dashboard) and Parameterized Classes.
- Installing CDH3 will not be supported.
- Based on the Cloudera Manager 4.1 Installation Guide
- TLS certificates must be in PEM format and are not deployed by this module.
- When using parcels, the CDH software is not deployed by Puppet. Puppet will only install the Cloudera Manager server/agent. You must then configure Cloudera Manager to deploy the parcels.
- When installing packages and not parcels on SLES, SP2 is required as the hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.sles11.x86_64 package requires netcat-openbsd which is not avalable on SLES 11SP1.
- Osfamily RedHat 5 requires the EPEL YUM repository when installing LZO support.
Issues
- Need external module support for the Oracle Instant Client JDBC.
- When using an external PostgreSQL server that is on the same host as the CM server, PostgreSQL must be configured to accept connections with md5 password authentication.
TODO
See TODO.md for more items.
Deprecation Warning
The default for use_parcels
will switch to true
before the 1.0.0 release.
This:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
}
would become this:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
use_parcels => false,
}
The puppetlabs/mysql dependency will update to version 2. Make sure to review its changelog in the case of an upgrade.
The class cloudera::repo
will be renamed to cloudera::cdh::repo
and the Impala repository will be split out into cloudera::impala::repo
.
This:
class { 'cloudera::repo':
cdh_version => '4.1',
cm_version => '4.1',
}
would become this:
class { 'cloudera::cdh::repo':
version => '4.1',
}
class { 'cloudera::impala::repo':
version => '4.1',
}
Contributing
Please see DEVELOP.md for contribution information.
License
Please see LICENSE file.
Copyright
Copyright (C) 2013 Mike Arnold mike@razorsedge.org
- Fixing license headers and whitespace.
2014-02-19 Michael Arnold github@razorsedge.org - 1.0.0
Michael Arnold github@razorsedge.org (26):
- Initial Suse zyprepo support.
- Support alternatives on Suse.
- Update rpsec tests for SLES support.
- Update README.md and TODO.md for SLES support.
- Remove explicit support for zypper package provider.
- Initial Debian and Ubuntu APT support.
- Fix Java deployment on Debian and Ubuntu.
- Use path to find the service command.
- Add missing aptkey suport on Ubuntu.
- Disable services only on supported OSs.
- Remove the hard-coded Class order.
- Update README.md to indicate Debian and Ubuntu support.
- Revamp Java alternatives.
- Allow for use of external Java module.
- Make sure JDK is install before other stuff.
- Improve JDK installation ordering.
- Drop prefix on yumserver-related class parameters.
- Clean out commented code.
- Collapse class anchors.
- Simplify the $use_parcels conditional.
- Do not start CM agent until all software is present.
- Require Class['epel'] on EL5 for LZO/GPL Extras.
- Update documentation.
- Call apt::source with $architecture where appropriate.
- Require dependency puppetlabs/apt 1.4.1 or newer.
- Update versions for 1.0.0 release.
2014-02-02 Michael Arnold github@razorsedge.org - 0.9.2
Michael Arnold github@razorsedge.org (4):
- Fix puppetlabs-mysql name error in Modulefile.
- Disable Service['solr-server'].
- Update TODO.md.
- Update versions for 0.9.2 release.
2014-02-02 Michael Arnold github@razorsedge.org - 0.9.1
Michael Arnold github@razorsedge.org (2):
- Fix metadata dependency invalid version.
- Update versions for 0.9.1 release.
2014-02-02 Michael Arnold github@razorsedge.org - 0.9.0
Michael Arnold github@razorsedge.org (33):
- Deal with sites that purge /etc/yum.repos.d/.
- Configure correct permissions on the keystore file.
- Fix lint errors in TLS tests.
- Fix version dependencies on SQL modules.
- Update Geppetto .project to version 4.
- Use new rspec-puppet "should compile".
- Change all module name references to $module_name.
- Install $majdistrelease pattern in params.pp.
- Update Travis-CI test matrix.
- Update documentation.
- Set tags on packages for fancy dependency chaining.
- Add some more tests (similar to the README.md).
- Switch to deploying CDH via parcels instead of RPMs.
- Update README to cover the change in parcel support.
- Add more CM server tests.
- Update MySQL/PostgreSQL dependencies.
- Update puppetdoc to reflect new requires in cm::server.
- Revert support for puppetlabs/postgresql 3.x.
- Separate Impala install from CDH.
- Move cloudera::repo to cloudera::cdh::repo.
- Added cloudera::repo backwards compatibility shim.
- Add tests for Impala deployment.
- Update TODO.md.
- Install Cloudera Search.
- Canonicalize all include statements.
- Switch all tests to use cm_server_host => localhost.
- Install Cloudera GPL Extras (LZO).
- Integrate GPL Extras with Class['cloudera'].
- Add $use_gplextras parameter to allow choice.
- Change Class['search'] to deploy the documented packages.
- Add package support for hcatalog, sentry, and sqoop2.
- Update TODO.md.
- Update versions for 0.9.0 release.
2013-09-14 Michael Arnold github@razorsedge.org - 0.8.0
Michael Arnold github@razorsedge.org (13):
- Update ERB variables to be Ruby instance variables.
- Add support for CM agent TLS.
- Add rspec tests for listening_hostname/fqdn in CM agent.
- Add support for TLS server auth to CM agent.
- Add support for CM server TLS and java_ks.
- Fix the value of $server_ca_file to match the docs.
- Update README.md TLS section.
- Functests show that verify_cert_file needs CA chain.
- Add TLS settings to Class['cloudera'].
- Add support for installing parcels.
- Add a diectory filter for Geppetto.
- Add support for YUM proxy, username, and password.
- Update versions for 0.8.0 release.
2013-08-17 Michael Arnold github@razorsedge.org - 0.7.0
Michael Arnold github@razorsedge.org (21):
- Update the Modulefile summary.
- Remove execute bits from rspec test files.
- Add support for JCE unlimited strength policy files.
- Remove support for the beta version of Impala.
- Update the scm-config.ini template.
- Expand Puppet versions tested in Travis-CI.
- Rake validate to also check ERB for syntax errors.
- Remove git-log-to-changelog from Modulefile.
- Add before_script: back to Travis-CI config.
- Coorect test in Travis-CI before_script.
- Really correct the test in Travis-CI before_script.
- No idea how to correct the test in Travis-CI before_script.
- Stick with puppetlabs-mysql 0.9.0.
- I hate you Travis-CI before_script.
- Nope. Stick with puppetlabs-mysql 0.8.1.
- Deal with changes to puppetlabs/mysql mysql::config.
- Use Cloudera's copy of ext-2.2.zip.
- Add contribution instructions to README.md.
- Revert to using puppetlabs-mysql HEAD.
- Drop testing support for Puppet 2.6.
- Update versions for 0.7.0 release.
2013-04-10 Michael Arnold github@razorsedge.org - 0.6.3
Michael Arnold github@razorsedge.org (2):
- Add puppet-lint to Gemfile to fix Travis.
- Update versions for 0.6.3 release.
2013-04-10 Michael Arnold github@razorsedge.org - 0.6.2
Michael Arnold github@razorsedge.org (2):
- Add puppet-lint support.
- Update versions for 0.6.2 release.
2013-02-23 Michael Arnold github@razorsedge.org - 0.6.1
Michael Arnold github@razorsedge.org (5):
- Add ripienaar/concat to fix postgresql rspec error.
- Add rspec require to Exec['scm_prepare_database'].
- Drop Travis support for testing old releases.
- Remove Note in README that was fixed in 0.6.0.
- Update versions for 0.6.1 release.
2013-02-20 Michael Arnold github@razorsedge.org - 0.6.0
Michael Arnold github@razorsedge.org (5):
- Re-enable PostgreSQL support.
- Fixed rspec tests for defined types.
- Finalize puppetlabs/postgresql dependency version.
- Add CHANGELOG generation to the Modulefile.
- Update versions for 0.6.0 release.
2013-01-30 Michael Arnold github@razorsedge.org - 0.5.0
Michael Arnold github@razorsedge.org (5):
- Remove trailing comma in class parameters.
- Add Travis-CI support for create_resources.
- Make puppet-lint happy.
- Remove PostgreSQL support.
- Update versions for 0.5.0 release.
2013-01-30 Michael Arnold github@razorsedge.org - 0.3.0
Michael Arnold github@razorsedge.org (5):
- Fix configs.
- First round of rspec tests.
- Rspec tests for cloudera::cm::server.
- Fix some spacing issues.
- Added the rest of the rspec tests.
2013-01-25 Michael Arnold github@razorsedge.org - 0.1.0
Michael Arnold github@razorsedge.org (37):
- Added standard puppet module boilerplate.
- Add Class['cloudera::repo'].
- Refacter SCM server and agent manifests.
- Rename module from scm to cloudera.
- Massive parameterized class effort.
- Stupid vim tried to be helpful.
- Manage Package['cloudera-manager-daemons'].
- Lots of fixes.
- Add puppet parser validate rake task.
- Fix missing curly brace. (Yay validation).
- Drop puppetlabs/stdlib dependency to 2.3.0.
- Disable service hadoop-httpfs.
- Do not install package cloudera-manager-daemons.
- Drop Flume support.
- Rename scm_agent and scm_server classes.
- Install flume-ng as part of cloudera::cdh.
- Stupid vim tried to be helpful.
- Refacter cloudera::cm::server.
- Add functionality to the database specific section.
- Fix $module variable and add module dependencies.
- Add support for Java installation.
- Add java alternatives to set up symlinks.
- Add headers to files managed by puppet.
- Extra Oozie support.
- Linting and Puppetdoc.
- More linting.
- Improve Oozie EXT support.
- Allow CM to control the Flume service.
- Keep it simple. Just include mysql::java.
- Add repositories to fixtures.yml.
- Add some Hive classes.
- Use safe_autoupgrade.
- Prepend cm_ to variable's documentation.
- Deal with package problems for hue-plugins.
- Add support fot Hadoop HDFS FUSE.
- Update README.md.
- Fix up README.md to look better.
2011-11-17 Patrick Taylor Ramsey ptr@cloudera.com - 0.0.1
Dependencies
- puppetlabs/stdlib (>=2.3.0)
- puppetlabs/mysql (>=2.0.0 <3.0.0)
- puppetlabs/postgresql (>=2.1.0 <3.0.0)
- nanliu/staging (>=0.2.1)
- stahnma/epel (>=0.0.3)
- puppetlabs/java_ks (>=1.0.0)
- darin/zypprepo (>=1.0.0 <2.0.0)
- puppetlabs/apt (>=1.4.1)
Copyright (C) 2013 Mike Arnold <mike@razorsedge.org> Copyright (c) 2011 Cloudera Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.