Chef:Powerful Infrastructure Automation
上QQ阅读APP看书,第一时间看更新

Writing custom extensions

With Chef, you are given immediate access to a number of resources: files, users, packages, templates, and so on. However, there will always be times when this does not provide you with everything that you need. Fortunately, the built-in Chef resources or LWRPs (light-weight resource providers) are just Ruby code and were built with the intention of providing a framework for end users to build their own components. This means that you can easily build your own custom resources, and these can be shared with others just like any built-in LWRP.

Developing a custom definition

One of the simplest resources that we can build is a definition—a definition is like a resource with only one built-in provider. These can be thought of as reusable modules that you can leverage inside of your recipes. If you find yourself writing the same thing repeatedly in your recipes, then it is probably a good candidate to write a custom definition. For example, let's look at how we can build two different definitions: one for executing Python's PIP, the Python package installation tool, to install the contents of a requirements.txt file for a Python application, and another to install applications that follow the same pattern.

Organizing your code

As discussed before, cookbooks can have a definitions directory inside them. The contents of this directory are included in your cookbook runs and should have one per definition. For our PIP resource, we will create a file, definitions/pip_requirements.rb, and for our application template, definitions/python_web_application.rb. These will each contain the respective definitions.

Writing a definition for using PIP

Definitions look like any Chef component—they are composed of resources, variables, scripts, and anything else you use in a recipe. However, unlike a recipe, they are designed to be reused. Where a recipe is designed with a specific effect in mind such as deploying a specific application, the definition is designed to be consumed by recipes to reduce duplicate code.

Each definition is encapsulated in a define block, a no-op version, or our PIP example would look like this:

define :pip_requirements do 
end

This example does absolutely nothing, but it can be used in a recipe as follows:

pip_requirements "app_requirements" do 
end

Just in the same way you would use a file, user, or template block in your recipe, you can use your custom definitions. Now, let's enhance our definition by using the name parameter—the string argument passed to the pip_requirements block in your recipe; here, it is app_requirements:

define :pip_requirements , :action => :skip do
  name = params[:name]
end

Each invocation of a definition passes the parameters in the block to the definition; these are accessed inside the definition through the params hash. There is one special parameter, :name, which can come from the first argument before the block, as shown in the previous code, or from the name parameter inside the block. This is a convenience parameter designed to make recipes more readable by allowing the developer to write:

resource "some handy description" do 
...
end

This code is easier to read than:

resource do 
  name "some handy description"
end

Given this information, let's look at the PIP example from pip_requirements.rb:

define :pip_requirements , :action => :skip do
    name = params[:name]
    requirements_file = params[:requirements_file]
    pip = params[:pip]
    user = params[:user]
    group = params[:group]
    
    if params[:action] == :run
      script "pip_install_#{name}" do 
        interpreter "bash"
        user "#{user}"
        group "#{group}"
        code <<-EOH
        #{pip} install -r #{requirements_file}
        EOH
        only_if { File.exists?("#{requirements_file}") and File.exists?("#{pip}") }
      end         
   end
end

Here, the definition expects five arguments: the resource name, the path to the requirements.txt file, the pip binary to use, as well as the user and group to execute pip as. The reason that the resource accepts the path to pip is to allow using pip inside a Python virtual environment. By doing this, the definition becomes a little more flexible in situations where you need to install your requirements into a different location on the system.

Also note that we can define default parameters as part of the definition's signature:

define :pip_requirements , :action => :skip do

In this case, the default action is :skip, but it can be set to anything you want it to be. Here it is set to :skip so that it only gets invoked deliberately rather than by virtue of being used in a recipe.

As this is a simple definition, it only contains one resource—a script block that will effectively execute pip install -r /path/to/requirements.txt as the specified user and group. An example use of this definition can be seen as follows:

pip_requirements "requirements" do
  action :run
  pip "/usr/local/bin/pip"
  user node[:app][:user]
  group node[:app][:group]
  requirements_file "#{app_root}/src/requirements.txt"
end

This can be used in place of the built-in script resource:

script "pip_install_#{name}" do 
  interpreter "bash"
  user node[:app][:user]
  group node[:app][:group]
  code <<-EOH
  /usr/local/bin/pip install -r #{app_root}/src/requirements.txt
  EOH
  only_if { 
    File.exists?("#{app_root}/src/requirements.txt") and
    File.exists?("/usr/local/bin/pip") 
  }
end

Following Chef's declarative language, building definitions such as this one makes it more obvious as to what is happening, rather than how it is happening. We have abstracted the shell script and guard tests behind a façade, that is the pip_requirements definition, which is more clear in its effect when you read a recipe; you don't need to examine the contents of the script block to deduce what is happening as the resource name tells you exactly what's going to be done.

Defining a full application template

If you have applications that follow the same structure (think applications that use a common framework such as Rails, Django, Pyramids, Python-tornado, and so on), then you would likely want to define a common definition for what such an application looks like. Consider here a definition to install a Python web application from GitHub using some common idioms:

define :tornado_application do
  app_name = params[:name]
  app_root = params[:app_root]
  app_user = params[:user]
  app_group = params[:group]
  
  python_interpreter = params[:python_interpreter] || 
                       "/usr/bin/python3.3"
  github_repo = params[:github_repo] 
  deploy_branch = params[:deploy_branch] || "deploy"

  virtualenv = "#{app_root}/python"
  virtual_python = "#{virtualenv}/bin/python"
  app_dir = "#{app_root}/src/#{app_name}"

  # Need to install SSH key for GitHub
  # this comes from the ssh_known_hosts cookbook
  ssh_known_hosts_entry 'github.com'

  # Base package requirements
  package "git"
  package "libpq-dev"
  package "libxml2-dev"
  package "python3.3"
  package "python3.3-dev"
    
  directory "#{app_root}" do
    owner "#{app_user}"
    group "#{app_group}"
    mode "0755"
    action :create
    recursive true
  end

  # Create directories
  ["bin", "src", "logs", "conf", "tmp"].each do |child_dir|
    directory "#{app_root}/#{child_dir}" do
      owner "#{app_user}"
      group "#{app_group}"
      mode "0755"
      action :create
      recursive true
    end
  end


  # Install Python virtualenv
  python_virtualenv "#{virtualenv}" do 
    owner "#{app_user}"
    group "#{app_group}"
    action :create 
    interpreter "#{python_interpreter}"
  end

  # Application checkout
  git "#{app_dir}" do
    repository "#{github_repo}"
    action :sync
    user "#{app_user}"
    branch "#{deploy_branch}"
  end

  # Python dependencies for app
  pip_requirements "tornado_app[#{app_name}]" do
    action :run
    pip "#{virtualenv}/bin/pip"
    user "#{app_user}"
    group "#{app_group}"
    requirements_file "#{app_dir}/requirements.txt"
  end

end

This definition can be used as shown in the following example:

tornado_application "image_resizer" do 
  app_root "/opt/webapps"
  user "webapp"
  group "webapp"
  deploy_branch "master"
  github_repo "git@github.com:myorg/image_resizer.git"
  python_interpreter "/usr/bin/python3.3"
end

According to the previous definition, this would do the following:

  • Add a system-wide SSH-known key for github.com (required to perform a Git clone and guarantees that future key changes will work)
  • Install any required packages if they didn't already exist, including Git, Python, and the postgresql client
  • Ensure any application-required directories exist for data such as binaries, logs, configuration, and more
  • Create a Python virtual environment based on the supplied Python interpreter (3.3) in <app_root>/python
  • Clone or sync (if it was already cloned) the source code from <github_repo> to <app_root>/src/<app_name>
  • Install the requirements specified in <app_root>/src/<app_name>/requirements.txt using the copy of pip from the virtual environment in <app_root>/python

Assuming you had another similarly structured application, but you wanted to use a different user, group, Python interpreter, and deployment branch, you can easily configure it using the following resource:

tornado_application "restful_api" do 
  app_root "/opt/webapps"
  user "restapi"
  group "restapi"
  deploy_branch "production"
  github_repo "git@github.com:myorg/restful_api.git"
  python_interpreter "/usr/bin/python3.2"
end

As you can see, definitions allow us to define reusable resources in Chef. There are three primary benefits to this approach:

  • Simplified recipes are easier to read, have clearer intentions, and less code to audit, which makes them less error prone
  • Any changes to the definition are automatically applied to any usage of the definition, which means you don't need to maintain multiple variations
  • It's easier to test because it's designed to be parameterized and modular

Now that you see how easy it is to write custom resources for Chef through definitions, let's examine writing a full-blown resource that has a separate provider implementation.

Building a resource

A Chef LWRP is composed of two primary components, a resource and a provider. The resource is the blueprint for what is being provided; it describes the resource, including what actions can be taken by a resource, the properties that describe the resource, and any other high-level information about it. The provider is responsible for the actual implementation of the resource. In programming terms, the resource is an abstract class or interface where the provider is a concrete class or implementation. For example, one of Chef's built-in resources is the package resource; however, this is a very high-level resource. The package resource describes what a package is and what a package can do but not how to manage them. That work is left to the providers, including RPM, APT, FreeBSD packages, and other backend systems that are capable of managing on-disk installation of packages.

Defining the resource

As an example, let's take a look at an S3 bucket resource:

actions :sync
default_action :sync if defined?(default_action) # Chef > 10.8

# Default action for Chef <= 10.8
def initialize(*args)
  super
  @action = :sync
end

# Target folder on the host to sync with the S3 bucket
attribute :destination, :kind_of => String, 
          :name_attribute => true
# Anything to skip when syncing
attribute :omit, :kind_of => Array
# AWS Access / secret key
attribute :access_key_id, :kind_of => String
attribute :secret_access_key, :kind_of => String

Here, our resource is an S3 bucket that declares the actions it can take along with the attributes that it relies on. Here, our resource declares that it has one available action, sync, which is the default action and that it has four attributes: the destination, what files to skip, the access key, and the secret key.

Implementing the provider

The provider is where the logic for the resource is placed—it is responsible for acting on the resource being described. For our S3 bucket, it looks like the following:

require 'chef/mixin/language'

# Only run as needed 
def whyrun_supported?
  true
end

action :sync do
 Chef::Log.debug("Checking #{new_resource} for changes")
 fetch_from_s3(new_resource.source) do |raw_file|
   Chef::Log.debug "copying remote file from origin #{raw_file.path} to destination #{new_resource.destination}"
   FileUtils.cp raw_file.path, new_resource.destination
 end
  
 new_resource.updated_by_last_action(true)
end

def load_current_resource
  chef_gem 'aws-sdk' do
    action :install
  end

  require 'aws/s3'

  current_resource = new_resource.destination
  current_resource
end

def fetch_from_s3(source)
  begin
    protocol, bucket = URI.split(source).compact
    AWS::S3::Base.establish_connection!(
        :access_key_id     => new_resource.access_key_id,
        :secret_access_key => new_resource.secret_access_key
    )

    bucket.objects.each do |obj|
      name = obj.key

      if !new_resource.skip.contains(name)
        Chef::Log.debug("Downloading #{name} from S3 bucket #{bucket}")
        obj = AWS::S3::S3Object.find name, bucket

        file = Tempfile.new("chef-s3-file")
        file.write obj.value
        Chef::Log.debug("File #{name} is #{file.size} bytes on disk")
        begin
          yield file
        ensure
          file.close
        end
      else
        Chef::Log.debug("Skipping #{name} because it's in the skip list")
      end
    end

  rescue URI::InvalidURIError
    Chef::Log.warn("Expected an S3 URL but found #{source}")
    nil
  end
end

Let's take a look at the provider, piece by piece. The first thing the provider does, beyond including any required libraries, is to inform Chef that it supports why-run. This is a mechanism that Chef provides so that resources can be more easily tested by effectively not wiring a resource to a provider. This allows developers to test their resources, in what is effectively a dry-run mode, before running them live against a system:

# Only run as needed 
def whyrun_supported?
  true
end

Next, there is an action block—this registers the provided block as the logic to be executed for the specified action (in this case, :sync). This has the general form such as:

action :<action name> do
  # Real work in here
end

In this case, the only supported action is sync, and so there is only one action block:

action :sync do
 Chef::Log.debug("Checking #{new_resource} for changes")
 fetch_from_s3(new_resource.source) do |raw_file|
   Chef::Log.debug "copying remote file from origin #{raw_file.path} to destination #{new_resource.destination}"
   FileUtils.cp raw_file.path, new_resource.destination
 end
 new_resource.updated
end

Here, the :sync action leverages the fetch_from_s3 method, which yields a local copy of a file in the remote bucket once it has been downloaded. Then, the file is copied from the temporary location locally into the specified destination.

Modifying resources

Inside of this action, you will notice that there is an actor, new_resource (which is actually a built-in method). This describes what the state of the named resource should be when the provider has completed its execution for the specified resource; this may or may not differ from the current state of the resource on the node. In the case of an initial run, new_resource will almost certainly be different from current_resource, but that may not always be the case on subsequent runs.

As an example, if we have a recipe with the following S3 bucket resource declared:

s3_bucket "s3://mychefbucket/.resource" do
  action :sync 
  skip ["foo.txt", "bar.txt]   
  destination "/opt/app_data" 
  access_key_id node[:app][:aws_access_key]
  secret_access_key node[:app][:aws_secret_key]
  owner node[:app][:user]
  group node[:app][:group]
  mode "0755"
 end

Then, the new_resource actor would have its member variables populated with the parameters passed to the s3_bucket resource. Again, this is the expected state of the resource, the way it should be when the execution by the provider is complete. In this case, when the provider code is executed, new_resource.destination will be "/opt/app_data" and new_resource.skip will be a list of "foo.txt" and "bar.txt" and so on. This allows you to pass data into the instance of the resource in the same way that was possible with the PIP and Tornado application definitions.

Loading an existing resource

One thing that is less obvious about the provider script is the load_current_resource method that is not called from within the provider. This method is used by Chef to find a resource on the node based on the attributes that are provided by the recipe. This is useful to determine if anything needs to be done to bring an existing resource on the host such as a file, a user account, or a directory of files, up to date with the data that is provided during execution of the recipe.

It might make sense to extend this provider to precompute the hashes of the files that already exist in the directory on-disk as specified by destination. This way, the provider can be updated to only download any remote files in S3 that have a different fingerprint than a similarly named resource on disk. This prevents unnecessary work from being performed, which saves time, bandwidth, and other resources.

Here, however, it is also used to ensure that any dependencies to download files are installed; in this case, the AWS gem is required to use the S3 client. This works because the load_current_resource method gets called on early to determine the current state of the resource. If the resources are the same, then the provider has nothing to do. The current implementation just clobbers whatever files are local with the contents of the S3 bucket (more of a one-way download than a sync, really).

Declaring that a resource was updated

Resources have a built-in method, updated_by_last_action, which is called inside the :sync action all the time in this example. This method notifies the resource that the node was updated successfully. This should only be set to true if everything was successfully updated; failures should not make this call unless they set it to false. It is useful to know what resources have been updated for reporting or other purposes. For example, you can use this flag to identify what resources have been updated:

module SimpleReport
  class UpdatedResources < Chef::Handler
    def report
      Chef::Log.info "Resources updated this run:"
      run_status.updated_resources.each do |r| 
          Chef::Log.info "  #{r.to_s}"
      end
    end
  end
end