Codementor Events

It is easier to gather package meta-data from PyPI package ecosystem, once know the right way

Published May 23, 2019


Image credits: https://cdn.filestackcontent.com/8ppPoPSsOF6RPQIGG8QM

You can check out my profile here on medium.

In the year 2019, I worked on a project which helped me to understand how PyPI ecosystem works. There were a lot of challenges I faced. I am writing this article, which could help someone in gathering the package related meta-data (individual package or otherwise).

General Meta-data

The first place to search the package information is this PyPI JSON API. Using this, one can get all the package related meta-data with few simple steps.

GET /pypi/<project_name>/json

Gives the package information. You can replace the <project_name> tag with the appropriate PyPI project name. For e.g. if you want to collect the metadata of package numpy (one of the top downloaded packages on PyPI) you can do https://pypi.org/pypi/numpy/json to get the data in json format for the corresponding package. If you are using python then you can do this simply by using requests package:

import requests
import json
url = “https://pypi.org/pypi/"
r = requests.get(url + “your_project” + “/json”).json()

Don’t forget to add a classic try and except error handling to collect the errors or bad requests if there are any.

Download Counts

Getting download counts of the particular package is trickier than getting just the meta-data. The PyPI JSON API doesn’t provide that information (not at least at the time of writing this article). There is an article on this url (python.org), which I checked before anything else. But, there is even simple way. Checkout this project (this is one of my favorites as it also gives total download counts for the package) which has a really cool web interface, and you can easily get the information you want, by writing a quick script using beautifulsoup or anything you would like. For e.g. you can get the download counts of your favorite package using, https://pepy.tech/project/pakcage-name. There are also some other projects available, something like this.


List of packages with highest no. of downloads — data fetched by me for a project

List of all the available package versions to download

Now, using the above information one can easily get all the package related metadata. What if someone wants to download a particular version of the particular package and they want a list of all the available versions. They can certainly do that by going on PyPI (e.g. pandas ) website and searching for a particular package. But, then they have to select a particular version and then check for the downloads available. What if they want a list of uploaded packages for all of the versions/ distros? There is a much simpler, but eminent way of doing that. Simple PyPI Repository is a repository which contains all the download versions available for all the packages on the PyPI ecosystem. For e.g. one can download a particular version of package urllib3 on https://pypi.org/simple/urllib3/.

List of all the packages

One can get the list of all the available packages on PyPI ecosystem, by doing something like (in python):

import xmlrpc.client as xc
# ‘packages’ contains a list of names of all the available packages # on PyPI
packages = client.list_packages()

Conclusion

Collecting and analyzing package meta-data can be useful to get the in depth information about the particular package. This can be used to decide the popularity of that particular package (or in general what the hell is going on). A bigger analysis of the ecosystem can also be done by digging deep into the package repository to get the some interesting information.

Discover and read more posts from Ruturaj Kiran Vaidya
get started
post commentsBe the first to share your opinion
Show more replies