Package Management in Python - Part I
This post was originally published on http://snoozy.ninja/.
Dependency hell - The situation when installed software packages have conflicting, incompatible dependencies on specific versions of other software packages.
Everybody knows this situation. We are starting a new project. This time everything will be perfect and completely thought out. We also plan to test a new framework, tool or package.
By creating the basics of a new project, we start very often from installing the necessary packages. Ideal examples are pytest, mypy or flake8. Over time, our application begins to require more and more external dependencies. There are packages that are only useful in the development process (ipdb) and those that always have to be used. After a while, we realize that some version of the packages we need collides with other packages. At this point, the first problems related to the management of these dependencies begin.
- How to keep information about required packages and their versions?
- How to distribute packages for production and development versions?
- How can I simply reproduce (deterministically) the programming environment completely?
- How do I use different versions of the same package in the same system?
In the last few years, a set of tools has been created that can solve the above-mentioned problems in various ways. The first type of useful solutions there are package managers like pip or easy_install.
The second group are tools for environmental management such as virtualenv or venv.
Let us now briefly discuss these tools to understand how they work and how they can help us help in everyday work.
Pip is currently the most popular tool for managing packages. It is installed together with python from version 3.4. In older versions of Python, in order to be able to use it, you should download the installer and simply run it. Pip is nothing more than a simple package installation manager. Instead of searching for the appropriate packages on the Internet, which we need, and then manually taking care of ourselves to update them, we can simply issue one command that will do it for us. The entire search, configuration and installation process is done automatically. The package update process can be reduced to issuing one command.
For example, by issuing the command:
pip install mypy
As you can see, the package was automatically found for us in the repository.
The default repository is pypi (and it does not have any meaningful competition at the moment), where you can find at this moment more than 130000 packages. PyPI
was founded around 2003 to create one great directory with third-party packages for Python. PyPi should be the first place where we search for a package to install (we can even searching for a package using the
pip search command). However, it should be remembered that due to the fact that each person can push a package to PyPi repository there are some dangerous situations related with that, so we should be sure that the package is exactly what we expect. Problems can be related to two aspects. The first one is the situation described above - a malicious package that looks very similar (for example the name can be very similar - urllib3 vs urlib3) with the behavior which looks like in the original package, but with some hidden functionality. This added functionality often do something very bad. The second problem may be the change/substitution of the package to its other version. How to deal with the latter problem I will show below.
Pip after finding the right package can automatically unpack, configure and install it. Currently, most packages are distributed in the .whl format. It is now the most popular format defined in PEP 427. We will discuss the .whl format another time because it is a topic that also deserves own blog entry.
Pip itself does not have much more functionality. In addition to installing, downloading and removing packages, we have two more things to mention.
mariusz@hal:~/gits/ccMina/build/gateway$ pip Usage: pip <command> [options] Commands: install Install packages. download Download packages. uninstall Uninstall packages. freeze Output installed packages in requirements format. list List installed packages. show Show information about installed packages. check Verify installed packages have compatible dependencies. search Search PyPI for packages. wheel Build wheels from your requirements. hash Compute hashes of package archives. completion A helper command used for command completion. help Show help for commands.
A worth mentioning function is certainly the
pip freeze command
This is a list of all our packages that we have installed using a pip. We see information about the package name and about the version that is currently used by us. It is very popular to save all this information in a file that we usually call
pip freeze > requirements.txt
pip freeze gives us results in a format that is known standard for storing dependencies.
By adding the
requirements.txt file to our project, we can quickly recreate the environment needed for our application to run using the
pip install -r requirements.txt command.
This is the first step to managing packages in our project. It is worth mentioning about pipdeptree which is an improved version of the
pip freeze command. This package allows us to sort packages due to dependencies and gives us more visual insight into which packages depend on each other.
The second function worth mentioning is certainly the
pip hash command. It's best to illustrate its example usage
pip hash we can create hash for a given package. This will allow us to be sure during the installation that we download exactly the same package as referred to in
requirements.txt. This will help us to avoid a situation where for some unspecified reasons a given version of the package has changed its content (which usually is not expected situation). By installing our requirements using pip, we can add the
require-hashes flag, e.g.
pip install --require-hashes -r requirements.txt
Our file with requirements should also store information about our requirements and it will look like this:
2. Virtualenv and venv
Using pip we can very quickly see that
pip freeze returns information about which packages are installed in our system. All packages, even those that are not directly used by the application we are currently creating. You can try to solve this problem by manually cleaning requirements.txt. Such a solution is tiring and sooner or later it will lead to the situation of adding something unnecessary or, worse, some package "will disappear" and our environment will not work properly. Classic - it works on my machine. In addition, we can meet a situation in which we need a specific version of a given package in our application, but our system uses a newer version or older version. We call this problem with dependencies - dependency hell.
To solve such problems was created virtualenv and venv. It's good to explain right away that the difference between virtualenv and venv is mainly that venv is a "newer" version that is currently shipped with Python from version 3.3. So for python before version 3.3 we should use virtualenv, and after that venv. So how do these virtual environments work?
The principle of operation is very simple. A virtual environment is nothing more than a place where we can install the packages we need and convinced the python interpreter that he has to use this place when he will need to import an external library or install something new. By creating virtual environments, we modify the path in which Python looks for the packages it needs. Because the packages installed in a virtual environment are kept in a completely different place than the default place in our system, we can do whatever we want without having to worry about the fact that some part of our system that requires a given package will stop working. We can decide even what Python version will be used in this environment. The principle of operation of virtual environments is, of course, a bit more complex and has changed a lot with the time of Python development (a perfect example of how core developers took a good idea from the community and made it an official standard). More information on the technical aspect of the virtual environment can be found here (maybe this is not the newest presentation, but still very interesting)
In the beginning, I simply created the new environment example-blog using the
python3 -m venv example-blog command. To activate the environment and start using it, you can use the command
source example-blog/bin/activate. This command sets the environment variable in the terminal
VIRTUAL_ENV which causes that from now our interpreter will use our virtual environment. When we activate our new environment, we can make sure it is empty (eg using
pip freeze) and install some sample package. When we want to end the work, all we have to do is use
deactivate and again we start to use the packages that have been installed on our operating system.
Browsing the contents of our virtual environment, we can see that the mypy package has been installed inside. Any change of its code will not affect completely the integrity of our operating system because it is not used anywhere except applications that will run in modern.
3. Part I - Summary
venv is just the first step to fully understand how modern package management works in Python. Even though pip is quite a simple tool in most cases is quite sufficient. The most important thing when managing packages in any system or language is the ability to easily install, search and update our packages. Pip is perfect for that.
However, there are other tools that approach the problem of managing packages in a completely different way. A good example is pipenv, which is a combination of pip and venv into one. However, about this (and many other tools) next time.
This post was originally published on http://snoozy.ninja/.