Kyle J. Roux

a functional web developer, in an object-oriented world

Creating An Asset Pipeline in Python with Paver

Published Apr 23, 2015Last updated Dec 02, 2022

What's Paver?

Lately, I have been looking for a good task runner, something similar to grunt or gulp, but one that I can use with Python to write the tasks. This is where Paver comes in.

Paver is a Python library that automates tasks. It has a clean API for creating tasks as Python functions, shell commands, or any combination of the two. Tasks can also depend on other tasks─they can pipe data between tasks, and they can have configuration values that changes as the data goes through each task or that resets as each new task begins. Obviously this is a very flexible system, so I feel it will meet my needs.

What are We Going to Automate?

I've been doing a lot of JavaScript programming lately, and I really want to move to just coding in CoffeeScript, as it is much more expressive and it's easier to read. I also want to use Less for CSS preprocessing for the same reasons. So right there we have 2 things that would be great to automate, we can throw in CSS and JavaScript minification and concatenation just to round things out.

How to Tell Paver About Your Tasks

The first thing Paver needs in order to start working is a file in your project's root: pavement.py. This is where you will define all of your pavement tasks. Then we will import the easy module from Paver, which holds all of the important items we will need from the API. After that, we start writing functions.

To tell Paver a function is a task, we just add the @easy.task decorator to the function definition. This allows us to write support functions that will not be exposed to the paver command line tool when we run tasks.

Assets and Pipelines

To start, let's write a support function that will gather files for us.

from paver import easy

def _get_files(root,ext=None,recurse=False):
    if not recurse:
        file_list = easy.path(root).files() if easy.path(root).exists() else None
    else:
       file_list =  list(easy.path(root).walkfiles()) if easy.path(root).exists() else None
    if file_list is not None:
        return file_list if ext is None else [
        f for f in file_list if file_list and any(
        map(
            lambda x: f.endswith(x),
            ext
        )
    )
    ]
    else:
        return []

Since I don't intend for this function to be used by programmers who may use these Paver tasks, it is merely here to help from behind the scenes. I am using a standard Python convention of marking it "private" by prefixing its name with an underscore. This way, if someone comes along and reads / uses the code, they will be aware of the fact that they should not rely on how that function works, because I do not intend it to be included in the public API, (meaning if you said from myModule import * , the function _get_files would not be available to use) and I may change its usage anytime without notice.

Now, let's look at how this function works, because we're already using Paver's stdlib and we haven't even defined a task yet. So after we import easy from the Paver library, we start our private function definition and say it will take up to three arguments, and a minimum of one. They are:

root - the root folder to start grabbing files
ext - an optional file extenstion to filter by
recurse - if true recurse into sub-directorys of root, default False

Then, inside the function, we first check if we should recurse or not and use that information to compose a list of Path objects. If any file extension is fine we return the list, but as it's probably more common, we filter the list by the given ext argument and return that list.

Build Types

One last thing we will want to do is differentiate between build types. A use case for this is having separate development and production build tasks. This way, when you build in production you get all of the minifacation and obstification you need, but in development, building will only write out compliled CoffeeScript. So our last helper will be an easy way for our tasks to determine what the current build type is:

def _get_buildtype():
    return options.build.buildtype if getattr(options.build,'buildtype') else os.environ.get('BUILD_TYPE')

For our first task, we will gather our source files (currently I'm writing CoffeeScript so that's what we will gather. However, we will create tasks for JavaScript and CSS as well). We will also create a variable outside of our task that will hold our data while it travels through our pipeline. It will be an instance of paver.options.Bunch, which is just a dictionary that allows us to use the dot notation to access its contents. So, instead of saying myDict['key'], you can just say myDict.key, which is easier to read and write.

We will add our Bunch object to another Bunch object options that we will import from paver.easy. options is how we are supposed to access command line options or default options in our tasks. Below we also set up our options object (basically our asset pipelines configuration), which we can easily change or extend, but I'll show you more of that later.

Also, notice that we want this to be usable by others, but we don't want anyone who uses it to be forced to install all dependencies that are required by all of our tasks. What if they don't need some of them? So, instead of just importing directly from the CoffeeScript module, I am putting the import inside of a try / except block. This way, if CoffeeScript isn't installed on the system, we can just throw out a warning once they try to run a task that needs it instead of throwing an exception that will kill the whole program.

Inital Setup

from paver import easy,options
from paver.easy import path,sh
from paver.options import Bunch
try:
    from jsmin import jsmin
except ImportError:
    jsmin = None
try:
    from coffeescript import compile as compile_coffee
except ImportError:
    compile_coffee = None
try:
    from uglipyjs import compile as uglify
except ImportError:
    uglify = None

options.assets = Bunch(
    css='',
    js='',
    folders=Bunch(
        js='src',
        css='static/css',
    ),
    js_files=[],
    js_ext='.coffee',
    css_ext='.css',
    outfiles=Bunch(
        prod=Bunch(
            js='vendor/app.min.js',
            css=''
        ),
        dev=Bunch(
            js='vendor/app.js',
            css=''
        )            
    ),
)

options.build = Bunch(
    buildtype=None,    
)

@easy.task
def get_js():
    '''
        gather all source js files, or some other precompiled source files
                     to gather files other than javascript, set:                     
                        options.assets.js_ext
                        
                     to the extension to collect ie for coffeescript:                     
                        options.assets.js_ext = '.coffee'
                        
    '''
    ext = options.assets.js_ext or '.js'
    files = _get_files(options.assets.folders.js,ext,True)    
    options.assets.js_files = map(lambda x: (str(x),x.text()),files)    


@easy.task
def get_css():
    '''
        gather all source css files, or some other precompiled source files
                     to gather files other than css files, set:                     
                        options.assets.css_ext
                        
                     to the extension to collect ie for less:                     
                        options.assets.css_ext = '.less'\n                                            
    '''
    ext = options.assets.css_ext or '.css'
    files = _get_files(options.assets.folders.css,ext,True)
    options.assets.css_files = map(lambda x: (str(x),x.text()),files)

If you notice, all we're really doing here is calling our helper function _get_files, and using either the value of options.assets.js_ext, or options.assets.css_ext to determine the types of files to gather in the task. This also takes care of gathering different types of source files like CoffeeScript, because to tell it to gather CoffeeScript, not JavaScript, you merley need to update options.assets.js_ext to '.coffee' and voilà! It gathers coffee script.

Compiling CoffeeScript

Now, let's move on to compiling the CoffeeScript, then minification, and uglification.

@easy.task
@easy.needs('get_js')
def coffee():
    '''
        compile coffeescript files into javascript
    '''
    if compile_coffee is None:
        easy.info('coffee-script not installed! cannot compile coffescript')
        return None
    options.assets.js_files = map(lambda x: ((x[0],compile_coffee(x[1],True))),options.assets.js_files)

Here we are first using the paver.easy.needs decorator to inform Paver that it will need to perform its gather_js task before it can run this task (if we haven't gathered anything, we won't have anything to compile). Then, we are emitting a warning and returning if the import for CoffeeScript earlier failed. We then map an anonymous function over our files, which compiles them using the coffeescript.compile function that we imported as compile_coffee. This way we won't overwrite the built-in compile function.

Minifacation, Uglification

Now for the minifacation and uglification tasks, they are almost exactly the same as the last task.

@easy.task
def minify():
    '''
        minify javascript source with the jsmin module
    '''
    if jsmin is None:
        easy.info('Jsmin not installed! cannot minify code')
        return None
    options.assets.js_files = map(lambda x: ((x[0],jsmin(x[1]))),options.assets.js_files)

@easy.task
def uglifyjs():
    '''
        uglify javascript source code (obstification) using the ugliPyJs module
    '''
    if uglify is None:
        easy.info('ugliPyJs not installed! cannot uglify code')
        return None
    for fle,data in options.assets.js_files:        
        options.assets.js_files[options.assets.js_files.index((fle,data))] = (fle,uglify(data))

Concatenation

Now, once all this is out of the way we will want all of our JavaScript in one big file, so we only have a single file to deal with. This is easy. Also, since we no longer need the file names associated with the data, we can throw that info away and just put each file on top of the next.

@easy.task
def concat():
    '''
        concatenate all javascript and css files currently in memory
    '''
    options.assets.js = ''.join(map(lambda x: str(x[1]),options.assets.js_files))
    options.assets.css = ''.join(map(lambda x: str(x[1]),options.assets.css_files))

Now all we need is a task to write out our finished files, and tasks for our build types and we're all done.

       
@easy.task
def write_js(buildtype=None):
    '''
        write out all gathered javascript to the file specified in options.assets.outfiles[BUILD_TYPE].js
    '''
    if not easy.path('vendor').exists():
        easy.info('making vendor dir')
        os.mkdir('vendor')
    buildtype = buildtype or _get_buildtype()    
    with open(options.assets.outfiles[buildtype].js,'w') as f:
        f.write(options.assets.js)
        easy.info('Wrote file: {}'.format(options.assets.outfiles[buildtype].js))

    
@easy.task
def build_production():
    '''
        Full Build: gather js or coffeescript, compile coffeescript, uglify, minify, concat, write out
    '''
    get_js()
    coffee()
    uglifyjs()
    minify()
    concat()
    write_js()

@easy.task
def build_dev():
    '''
        Partial Build: gather js or coffeescript, compile coffeescript, concat, write out
    '''
    get_js()
    coffee()
    concat()
    write_js()
    
@easy.task
@easy.cmdopts([
 ('buildtype=','t','build type')
])
def build(options):
    '''
        Run Build, defaults to 'dev'
    '''
    if(not hasattr(options,'build') or (options.build.get('buildtype',None) is None)):
        buildtype = 'dev'        
    else:
        buildtype = options.build.buildtype
    os.environ['BUILD_TYPE'] = buildtype
    dict(
        dev=build_dev,
        prod=build_production,
    )[buildtype]()

The thing to really pay attention to here is how we're calling the correct build task function, because we know we want an item from a number of choices (ie: build_dev and build_production), and we just want to call the result. An easy way to accomplish this in Python is to compose a dict containing all of your choice names / related functions ie:

    dict(
            dev=build_dev,
            prod=build_production,
    )

Then, just access the value you need and call it. Since the dict function will return a dictonary, we can just access the item in it that we need, and call it to perform its associated action. ie:

dict(
        dev=build_dev,
        prod=build_production,
    )[buildtype]() #<---- this is where we access the task function we need and call it

Now, all we need is a task that will build our project files, and run the result through Node.js so we know it's working:

    
@easy.task
def run():
    '''
        Run production build and pipe generated js into nodejs for execution
    '''
    options.build = Bunch()
    buildtype = options.build.buildtype = 'prod'
    build(options)
    sh("nodejs {}".format(options.assets.outfiles[buildtype].js))

To check it out, just run:

$ paver run

And we have an asset pipeline. Of course, it could possibly do more, such as

compile less / sass files
process files with ng-annotate
minify html or css

However, I will leave these things up to you, since at the moment, this covers most of my own use cases.

To see the full code, check it out on github

Automation Python Asset pipeline Paver

Report

Enjoy this post? Give Kyle J. Roux a like if it's helpful.

Kyle J. Roux

a functional web developer, in an object-oriented world

web developer working daily to fix the various problems of the new age internet

Discover and read more posts from Kyle J. Roux

get started