Python Practices for Efficient Code: Performance, Memory, and Usability

Published Aug 18, 2017Last updated Aug 30, 2017
Python Practices for Efficient Code: Performance, Memory, and Usability

This is an updated version of my previous blog post on few recommended practices for optimizing your Python code.

A codebase that follows best practices is highly appreciated in today's world. It's an appealing way to engage awesome developers if your project is Open Source. As a developer, you want to write efficient and optimized code, which is:

Code that takes up minimum possible memory, executes faster, looks clean, is properly documented, follows standard style guidelines, and is easy to understand for a new developer.

The practices discussed here might help you contribute to an Open Source organization, submit a solution to an Online Judge, work on large data processing problems using machine learning, or develop your own project.

Practice 1: Try Not To Blow Off Memory!

A simple Python program may not cause many problems when it comes to memory, but memory utilization becomes critical on high memory consuming projects. It's always advisable to keep memory utilization in mind from the very beginning when working on a big project.

Unlike in C/C++, Python’s interpreter performs the memory management and users have no control over it. Memory management in Python involves a private heap that contains all Python objects and data structures.

The Python memory manager internally ensures the management of this private heap. When you create an object, the Python Virtual Machine handles the memory needed and decides where it'll be placed in the memory layout.

However, greater insight into how things work and different ways to do things can help you minimize your program's memory usage.

  • Use generators to calculate large sets of results:

Generators give you lazy evaluation. You use them by iterating over them: either explicitly with 'for' or implicitly, by passing it to any function or construct that iterates.

You can think of generators returning multiple items like they're returning a list — instead of returning them all at once, however, they return them one-by-one. The generator function is paused until the next item is requested. Read more about Python Generators here.

  • For large numbers/data crunching, you can use libraries like Numpy, which gracefully handles memory management.

  • Don't use + for generating long strings — In Python, str is immutable, so the left and right strings have to be copied into the new string for every pair of concatenations. If you concatenate four strings of length 10, you'll be copying (10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters instead of just 40 characters. Things get quadratically worse as the number and size of the string increases. Java optimizes this case by transforming the series of concatenations to use StringBuilder some of the time , but CPython doesn't.
    Therefore, it's advised to use .format or % syntax (however, they are slightly slower than + for short strings). Or better, if already you've contents available in the form of an iterable object, then use ''.join(iterable_object) which is much faster.
    If you can't choose between .format and %, check out this interesting StackOverflow thread.

    def add_string_with_plus(iters):
        s = ""
        for i in range(iters):
            s += "xyz"
        assert len(s) == 3*iters
    
    def add_string_with_format(iters):
        fs = "{}"*iters
        s = fs.format(*(["xyz"]*iters))
        assert len(s) == 3*iters
    
    def add_string_with_join(iters):
        l = []
        for i in range(iters):
            l.append("xyz")
        s = "".join(l)
        assert len(s) == 3*iters
    
    def convert_list_to_string(l, iters):
        s = "".join(l)
        assert len(s) == 3*iters
    

    Output:

    >>> timeit(add_string_with_plus(10000))
    100 loops, best of 3: 9.73 ms per loop
    >>> timeit(add_string_with_format(10000))
    100 loops, best of 3: 5.47 ms per loop
    >>> timeit(add_string_with_join(10000))
    100 loops, best of 3: 10.1 ms per loop
    >>> l = ["xyz"]*10000
    >>> timeit(convert_list_to_string(l, 10000))
    10000 loops, best of 3: 75.3 µs per loop
    
  • Use slots when defining a Python class. You can tell Python not to use a dynamic dict, and only allocate space for a fixed set of attributes, eliminating the overhead of using one dict for every object by setting __slots__ on the class to a fixed list of attribute names. Slots also prevent arbitrary attribute assignment on an object, thus the shape of the object remains same throughout. Read more about slots here.

  • You can track your memory usage at object level by using built-in modules like resource and objgraph.

  • Managing memory leaks in Python can be a tough job, but luckily there are tools like heapy for debugging memory leaks. Heapy can be used along with objgraph to watch allocation growth of diff objects over time. Heapy can show which objects are holding the most memory. Objgraph can help you find back-references to understand exactly why they cannot be freed. You can read more about diagnosing memory leaks in Python here.

You can read in detail about Python memory management by the developers of Theano here.

Practice 2: Write Beautiful Code because - "The first impression is the last impression."

Sharing code is a rewarding endeavor. Whatever the motivation, your good intentions may not have the desired outcome if people find your code hard to use or understand. Almost every organization follows style guidelines that developers have to follow for consistency, easy debugging, and ease of collaboration. The Zen of Python is like a mini style and design guide for Python. Popular style guidelines for Python include:

  1. PEP-8 style guide
  2. Python Idioms and efficiency
  3. Google Python Style Guide

These guidelines discuss how to use: whitespace, commas, and braces, the object naming guidelines, etc. While they may conflict in some situations, they all have the same objective — "Clean, Readable, and Debuggable standards for code."

Stick to one guide, or follow your own, but don't try to follow something drastically different from widely accepted standards.

Using static code analysis tools

There are lots of open source tools available that you can use to make your code compliant with standard style guidelines and best practices for writing code.

Pylint is a Python tool that checks a module for coding standards. Pylint can be a quick and easy way of seeing if your code has captured the essence of PEP-8 and is, therefore, ‘friendly’ to other potential users.

It also provides you with reports with insightful metrics and statistics that may help you judge code quality. You can also customize it by creating your own .pylintrc file and using it.

Pylint is not the only option — there are other tools like PyChecker, PyFlakes, and packages like pep8 and flakes8.
My recommendation would be to use coala, a unified static code analysis framework that aims to provide language agnostic code analysis via a single framework. Coala supports all the linting tools I mentioned previously, and is highly customizable.

Documenting the code properly

This aspect is most critical to the usability and readablity of your codebase. It is always advised to document your code as extensively as possible, so that other developers face less friction to understand your code.
A typical inline-documentation of a function should include:

  • A one line summary of what the function does.
  • Interactive examples, if applicable. These could be referred by the new developer to quickly observe the usage and expected output of your function. As well as you can use the doctest module to assert the correctness of these examples (running as tests). See the doctest documentation for examples.
  • Parameters documentation (generally one line describing the parameter and its role in the function)
  • Return type documentation (unless your function doesn't return anything!)

Sphinx is a widely used tool for generating and managing your project documentation. It offers a lots of handy features that would reduce your efforts in writing a standard documentation. Moreover, you can publish your documentation at Read the Docs for free, which is the most common way of hosting documentation for projects.
The Hitchiker's guide to Python for documentation contains some interesting information that may be useful to you while documenting your code.

Practice 3: Speed Up Your Performance

Multiprocess, not Multi-thread

When it comes to improving the execution time of your multiple-task code, you may want to utilize multiple cores in the CPU to execute several tasks simultaneously. It may seem intuitive to spawn several threads and let them execute concurrently, but, because of the Global Interpreter Lock in Python, all you're doing is making your threads execute on the same core turn by turn.

To achieve actual parallelization in Python, you might have to use a Python multiprocessing module. Another solution might be outsourcing the tasks to:

  1. The operating system (by doing multi-processing)
  2. Some external application that calls your Python code (e.g., Spark or Hadoop)
  3. Code that your Python code calls (e.g. you could have your Python code call a C function that does the expensive multi-threaded stuff).

Apart from multiprogramming, there are other ways to boost your performance. Some of them include:

  • Using the latest version of Python: This is the most straightforward way because new updates generally include enhancements to already existing functionalities in terms of performance.

  • Use built-in functions wherever possible: This also aligns with the DRY principle — built-in functions are carefully designed and reviewed by some of the best Python developers in the world, so they're often the best way to go.

  • Consider using Ctypes: Ctypes provides an interface to call C shared functions from your Python code. C is a language closer to machine level, which makes your code execute much faster compared to similar implementations in Python.

  • Using Cython: Cython is a superset Python language that allows users to call C functions and have static type declarations, which eventually leads to a simpler final code that will probably execute much faster.

  • Using PyPy: PyPy is another Python implementation that has a JIT (just-in-time) compiler, which could make your code execution faster. Though I've never tried PyPy, it also claims to reduce your programs' memory consumption. Companies like Quora actually use PyPy in production.

  • Design and Data Structures: This applies to every language. Make sure you're using the right data structures for your purpose, declare variables at the right place, wisely make use of identifier scope, and cache your results wherever it makes sense, etc.

A language specific example that I could give is — Python is usually slow with accessing global variables and resolving function addresses, so it's faster to assign them to a local variable in your scope and then access them.

Practice 4: Picking the right Versions!

Python2.x or Python3.x?

On one hand, Python3 has some great new features. On the other hand, you may want to use a package that only support Python2 like (Apple's coremltools. Moreover Python3 is not backward-compatible. This means that running your Python2 code on a Python3.x interpreter can possibly throw errors.

It is advisable to use the latest release of Python when starting a new project, but, if for some reason you've to stick to Python 2.x, then it is possible to write code in a way that works on both Python2 and Python3 interpreters. The most common way is to use packages like future, builtins, and six to maintain a single, clean Python3.x compatible codebase that supports both Python2 and Python3 with minimal overhead.

python-future is the missing compatibility layer between Python2 and Python3. It provides future and past packages with backports and forward ports with features from Python3 and Python2. It also comes with futurize and pasteurize, customized 2-to-3 based scripts that help you easily convert either Py2 or Py3 code to support both Python2 and Python3 in a single clean Py3-style codebase, module by module.

Please check out the excellent Cheat Sheet for writing Python 2-3 compatible code by Ed Schofield. If you're more into watching videos than reading, you may find his talk at PyCon AU 2014, “Writing 2/3 compatible code” helpful.

Handling your pip requirements

Generally, all the pip dependencies of a project are specified in a file named requirements.txt in the root of your project. Another person trying to run your project can simply install all the requirements using this file with the command pip install -r requirements.txt. It is also a common practice to put the dependecies required for running your tests in a separate file named test-requirements.txt.

Note that pip does not use requirements.txt when your project is installed as a dependency by others. Generally, for that, you'll have to specify dependencies in the install_requires and tests_require arguments of setuptools.setup function in your setup.py file. If you want to maintain a common dependency file for both packaging and development, you can do something like

import os
from setuptools import setup

with open('requirements.txt') as f:
    required = f.read().splitlines()

setup(...
    install_requires=required,
      ...)

Also, sometimes, a recent upgrade in any dependency can break your project. For this reason, it is a safe practice to freeze your dependency version. Do checkout this post by Kenneth Reitz which discusses a simple and nice workflow for handling dependency versions of your project.

Use virtual environment

For the very same reason I mentioned above that a change in version of the dependency can break certain parts of your projects, it is often advisable to use Virtual environments (lightweight, self-contained Python installations) to avoid conflicting versions of a dependency across multiple projects while developing. Apart from this, they are super easy to setup, The Hitchiker's Guide to Python discusses some basic usage here.

Versioning your project.

Follow Semantic versioning, hands down! See this guide for different ways to store your project version in your package.

Practice 5: Analyzing your code

It's often helpful to analyze your code for coverage, quality, and performance. Python comes with the cProfile module to help evaluate performance. It not only gives the total running time, it also times each function separately.

It then tells you how many times each function was called, which makes it easy to determine where you should make optimizations. Here's what a sample analysis by cProfile looks like:

screenshot-from-2016-12-26-17-34-10

  • memory_profiler is a Python module for monitoring memory consumption of processes, as well as a line-by-line analysis of memory consumption for Python programs.

  • objgraph allows you to show the top N objects occupying our Python program’s memory, what objects have been deleted or added over a period of time, and all references to a given object in your script.

  • resource provides basic mechanisms for measuring and controlling system resources utilized by a program. The module's two prime uses include limiting the allocation of resources and getting information about the resource's current usage.

Practice 6: Testing and Continuous Integration

Testing:
It is good practice to write unit tests. If you think that writing tests aren't worth the effort, take a look at this StackOverflow thread. It's better to write your tests before or during coding. Python provides unittest modules to write unit tests for your functions and classes. There are frameworks like:

  • nose - can run unittest tests and has less boilerplate.
  • pytest - also runs unittest tests, has less boilerplate, better reporting, and lots of cool, extra features.
    To get a good comparison among these, read the introduction here.

Not to forget the doctest module, which tests your source code using the interactive examples illustrated in the inline documentation.

Measuring coverage:
Coverage is a tool for measuring Python program code coverage. It monitors your program, notes which parts of the code have been executed, then analyzes the source to identify code that could've been executed but was not.

Coverage measurement is typically used to gauge the effectiveness of tests. It can show which parts of your code are being exercised by tests, and which are not. It is often advisable to have 100% branch coverage, meaning your tests should be able to execute and verify the output of every branch of the project.

Continuous Integration:
Having a CI system for your project from the very beginning can be very useful for your project in the long run. You can easily test various aspects of your codebase using a CI service. Some typical checks in CI include:

  • Running tests in a real world environment. There are cases when tests pass on some architectures and fail on others. A CI service can let you run your tests on different system architectures.

  • Enforcing coverage constraints on your codebase.

  • Building and deploying your code to production (you can do this across different platforms)

There are several CI services available nowadays. Some of the most popular ones are Travis, Circle (for OSX and Linux) and Appveyor (for Windows). Newer ones like Semaphore CI also seem reliable, per my initial use. Gitlab (another Git repository management platform like Github) also supports CI, though you'll need to configure it explicitly, like with other services.

Update: This post was entirely based on my personal experiences. There may be a lot of things which I missed (or I'm not aware of). If you have something interesting to share, do let me know in comments. Someone started a thread on the same topic in HN, I'd recommend you to check it out https://news.ycombinator.com/item?id=15046641 for more critical discussions regarding this post. I'll try to address all the suggestions and keep updating this post frequently.

Discover and read more posts from Satwik Kansal
get started
Enjoy this post?

Leave a like and comment for Satwik

34
9
9Replies
Christian Peters
a month ago

When starting a new Python project, or even just learning Python, you might find yourself with the dilemma of choosing between Python2 or Python3.

I don’t think that is a dilemma. Use Python3. Period.

It’s been released 2008, that’s almost 10 years or - as we say in IT - ages. If there is a package that does not support python3, then you don’t want to use it. It hasn’t caught up with the future in ages, dooming your project to be trapped in the past.

Satwik Kansal
a month ago

I think you’re right. That section of the post was intended to discuss ways to write Python 2-3 compatible code. Personally, I don’t want to encourage someone to use Python2 over Python3, but as someone on HN mentioned there are libraries like Apple’s coremltools which don’t have Python3 support. So my only intention is to let the readers know the ways to deal with such a situation. Thanks for your comment btw. I’ll reframe this paragraph in the next iteration.

MinJae Kwon
a month ago

Hi I’m mingrammer. This post is very interesting for me.

So I wanna translate this to korean on my personal blog.

Can I do that?

Thanks

Satwik Kansal
a month ago

Hi, MinJae! Sure you can do that, I’d appreciate a reference to the original post in your translated version. And yeah, let me know if you need any other help regarding this :)

Atis Elsts
a month ago

Interesting article. As the guys on HN show, for short strings + is faster than format(). For long strings it would be better. "".join([a, b, c, d]) is always pretty fast.

A minor point: I believe the operator “+=” is fast as well, as it does not destroy the first string. Also, when adding just two strings with s1 = s1 + s2, the interpreter is smart enough to convert that to s1 += s2, so it’s fast.

Compare:

# using "+", three strings:
>>> timeit.timeit("s1 = s1 + s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
0.25748300552368164

# using "+=", three strings:
>>> timeit.timeit("s1 += s2 + s3", setup="s1 = ' ' * 100000; s2 = ' ' * 100000; s3 = ' ' * 100000", number=100)
0.012188911437988281
```
Satwik Kansal
a month ago

Interesting, for two string s1 = s1 + s2 is even faster than s1 += s2 (maybe because of the extra effort in resolving the += operator), and join comes out to be slowest in this case.

 >>> timeit.timeit("s1 +=  s2", setup="s1= ' '*100; s2=' '*100")
0.10682392120361328

>>> timeit.timeit("s1 = s1 +  s2", setup="s1= ' '*100; s2=' '*100")
0.09906697273254395

>>> timeit.timeit("s = ''.join([s1, s2])", setup="s1= ' '*100; s2=' '*100")
0.2219560146331787

Anyways, I’ll update the blog post once again after addressing all the discussions in the HN thread. Thanks for the comment :)

Show more replies

Get curated posts in your inbox

Read more posts to become a better developer