Codementor Events

Harnessing the power of Django and Python to build a configurable taxonomy

Published Feb 09, 2018Last updated Aug 08, 2018
Harnessing the power of Django and Python to build a configurable taxonomy

This article aims to present a way to implement a fully flexible taxonomy system inside your Django app. The editing implementation will rely heavily on the use of Wagtail (a CMS built on Django) but will still be relevant if only Django is used.

Business case

The case for a taxonomy can be broad — you may be developing a blog, are bored, and really want a complicated way to tag posts. Alternatively, you may be working with a knowledge management system and need to provide a structured system to manage hierarchal categorization of your team's information.

Either way, it is important to understand your goals before writing a single line of code. Or at least write some code, get frustrated, and then come back to think about what you are trying to do.

Our goals

  • Build a flexible system to manage a nested (tree shaped) taxonomy.
  • We must be able to go arbitrarily deep.
  • We must be able to add the canonical (correct) terms but also have space to provide and search via the non-correct terms (such as abbreviations).
  • We need to minimize dependencies and stay as close to Django conventions as possible (for future maintainability).
  • Avoid any difficult to understand terms in the user interface (e.g. taxonomy).

What is a business taxonomy?

Glad you asked! Think of a taxonomy as a globally shared vocabulary for the business or organization. This vocabulary is often used throughout all documentation, categorization, and training, but never really written down in one place.

Taxonomies help organize content and knowledge into hierarchical relationships, adding detail to terms and concepts the further you go down the levels.

These two links add a bit more context:

Wearing the right hat

When I worked on a similar project for a client, one thing I found hard was switching between the right hats.

One hat was the business analyst, a.k.a, the guy who needs to translate what the boss has asked for. With this hat on, I found that there were legitimate concerns over how the company's information could be managed, searchable, and categorized to help add value to the organization as a whole.

The next hat was that of the developer. Here, I had to work with existing code and frameworks to implement a complex solution quickly and simply, along with consideration for future development wherever possible.

Finally, the hat that matters in the long run — the one of the everyday user. It was this hat I often found the hardest to don after wearing the others for a long time.

The concepts, both abstract and data model side, made sense to me and it felt like everyone else would get on board easily. In reality, I had to remember that I had been thinking and brainstorming this project for a long time and had the chance to really internalize the goals and way to think.

In the end, we landed on a great single sentence that helped our end users grok the concept of our 'taxonomy'. We also ditched the name taxonomy all together and used a more friendly internally relevant terminology instead.

Prerequisites

Installation of Wagtail 2.0. As of publication, this is still a release candidate but is solid enough to use.

We will be using Django 2.0 and all Python 3.5+ syntax (because it is awesome!).

Finally, we will be taking advantage of an incredible Python project called django-treebeard. I first found out about this project in depth after working with Wagtail for a while.

Essentially, this library takes all of the heavy lifting of managing a nested tree model inside a standard relational database. It is projects like this that get me excited about the power of Python and also the way Django can be extended. Shout out to @tabo for this epic project.

Note: If you have Wagtail up and running, you will not need to install django-treebeard. For a raw Django project, you will need to install the package.

Code walkthrough

1 - the 'Node' model

Naming this is hard. For now, we will just call our elements inside the taxonomy a 'node'. Our nodes will extend the django-treebeard project's Materialized Path Tree nodes, described as follows:

  • Each node has one single path in the tree (think URL paths).
  • There must be one single root node that all other nodes connect to.
  • Nodes can be ordered in relation to their siblings. Initially, we will just order them by their name, the field.
  • Nodes have a path, depth, and numchild field whose values should not be changed directly.
  • The default set up can have a depth of 63, which I am sure will be sufficient for our use case.

We will be adding our own fields to the Node model:

  • name - a CharField that represents the canonical name of the Node.
  • aliases - a TextField where each line represents another potential name or abbreviation for the Node.
  • node_order_index - an IntegerField which can be used in the future if we want to implement custom ordering in the user interface.

Here is our initial model definition for the Node model:

# File: my_app/models.py
from django import forms
from django.core.validators import MinLengthValidator
from django.db import models

from treebeard.mp_tree import MP_Node

from wagtail.admin.edit_handlers import FieldPanel


class Node(MP_Node):
    """Represents a single nestable Node in the corporate taxonomy."""

    # node editable fields
    name = models.CharField(
        max_length=50,
        unique=True,
        help_text='Keep the name short, ideally one word.',
        validators=[MinLengthValidator(5)]
    )
    aliases = models.TextField(
        'Also known as',
        max_length=255,
        blank=True,
        help_text="What else is this known as or referred to as?"
    )

    # node tree specific fields and attributes
    node_order_indaex = models.IntegerField(
        blank=True,
        default=0,
        editable=False
    )
    node_child_verbose_name = 'child'
    # important: node_order_by should NOT be changed after first Node created
    node_order_by = ['node_order_index', 'name']

    # wagtail specific - simple way to declare which fields are editable
    panels = [
        FieldPanel('parent'),  # virtual field - see TopicForm later
        FieldPanel('name'),
        FieldPanel('aliases', widget=forms.Textarea(attrs={'rows': '5'})),
    ]

After you have this model declared, you will want to run migrations in your console:

  • $ python3 ./manage.py makemigrations
  • $ python3 ./manage.py migrate

2 - The form

For the sake of simplicity, we will assume all of the code will go in the same models.py file. In practice, you would be best served splitting up into separate files, but it is easier to get up and running with everything in one place.

We will be using the Wagtail system of building forms, but you can apply the main __init__ and __save__ overrides to any Django form or even Django modeladmin.

Key items to note:

  • The djang-treebeard node API reference will come in handy here, we will be using methods like get_depth and is_root from this API.
  • parent is a field that provides a user interface to select the parent of the node being edited (or created). We have extended the ModelChoiceField class to create a custom BasicNodeChoiceField where we can get a nice indication of the Node structure in our select box.
  • __init__ on our form has been modified to do a few things.
    • instance will be an instance of Node bound to the values provided when the form submits, when creating or editing a Node.
    • If we are editing the root node (instance.is_root()) or creating the first node (Node.objects.count() is 0) we want to ensure that the parent field is hidden and will not throw an error if not filled out.
    • If we are editing an existing node we want to pre-select the node's parent via get_parent().
  • save needs to be changed to work with the django-treebeard API, as we cannot just create or move Nodes directly.
    • First, we get the Node instance that is attempting to be saved, then we get the value of parent submitted with the form (which will be None for the root Node).
    • If we are not committing changes on this save call, we can simply return the instance provided.
    • Otherwise, we want to handle the following cases:
      • Creating the first Node, which will become the root Node, handled by the classmethod add_root.
      • Creating a Node, but not the root Node, which must be placed as a child under an existing parent Node via add_child on the parent node.
      • Making non-parent changes to any Node is handled by the normal save method.
      • Moving an existing node to a new location under a different parent Node, handled by move(parent, pos='sorted-child').
    • Finally, we tell Wagtail to use this form class when editing the Node model via Node.base_form_class = NodeForm.
# File: my_app/models.py
# ... other imports from previous sections
from django import forms

from wagtail.admin.forms import WagtailAdminModelForm


class BasicNodeChoiceField(forms.ModelChoiceField):
    def label_from_instance(self, obj):
        depth_line = '-' * (obj.get_depth() - 1)
        return "{} {}".format(depth_line, super().label_from_instance(obj))


class NodeForm(WagtailAdminModelForm):

    parent = BasicNodeChoiceField(
        required=True,
        queryset=Node.objects.all(),
        empty_label=None,
    )

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        instance = kwargs['instance']

        if instance.is_root() or Node.objects.count() is 0:
            # hide and disable the parent field
            self.fields['parent'].disabled = True
            self.fields['parent'].required = False
            self.fields['parent'].empty_label = 'N/A - Root Node'
            self.fields['parent'].widget = forms.HiddenInput()

            # update label to indicate this is the root
            self.fields['name'].label += ' (Root)'
        elif instance.id:
            self.fields['parent'].initial = instance.get_parent()

    def save(self, commit=True, *args, **kwargs):
        instance = super().save(commit=False, *args, **kwargs)
        parent = self.cleaned_data['parent']

        if not commit:
            # simply return the instance if not actually saving (committing)
            return instance

        if instance.id is None:  # creating a new node
            if Node.objects.all().count() is 0:  # no nodes, creating root
                Node.add_root(instance=instance)  # add a NEW root node
            else:  # nodes exist, must be adding node under a parent
                instance = parent.add_child(instance=instance)
        else:  # editing an existing node
            instance.save()  # update existing node
            if instance.get_parent() != parent:
                instance.move(parent, pos='sorted-child')
        return instance


Node.base_form_class = NodeForm

3 - Wagtail modeladmin editing

We will now use the Wagtail modeladmin module. This is a powerful way to add CRUD operations to our models in the admin interface. It is similar (in concept) to Django's modeladmin, but not the same. It also makes extensive use of the awesome Class-based views.

Note: The Class-based views provide a great way to add functionality to Django without reinventing the wheel. They are easy to customize and provide a great API that is easy to extend and gives you a great example of a structure for view classes.

We will be declaring a new class that will extend ModelAdmin:

  • model is set to our Node model class.
  • list_display has our name and alias field, along with a method available on the MP_Node class get_parent.
  • inspect_view_enabled means that the users can click on a simple view page to look at details but not edit anything on the Node.
# File: my_app/models.py
# ... other imports from previous sections

from wagtail.contrib.modeladmin.options import ModelAdmin


class NodeAdmin(ModelAdmin):
    """Class for presenting topics in admin using modeladmin."""

    model = Node

    # admin menu options
    menu_icon = 'fa-cube'  # using wagtail-fontawesome
    menu_order = 800

    # listing view options
    list_per_page = 50
    list_display = ('name', 'get_parent', 'aliases')
    search_fields = ('name', 'aliases')

    # inspect view options
    inspect_view_enabled = True
    inspect_view_fields = ('name', 'get_parent', 'aliases', 'id')

We will then register our custom ModelAdmin in a new file called wagtail_hooks.py. This is a special file name convention that Wagtail will ensure runs before the admin interface is prepared.

# File: my_app/wagtail_hooks.py

from .models import NodeAdmin
from wagtail.contrib.modeladmin.options import modeladmin_register

modeladmin_register(NodeAdmin)

4 - Node model enhancements

For round two of our model definition, we will add some nice helper methods to be used later.

  • Node now also extends index.Indexed — this provides the ability for this model to be indexed for searching. See also the search_fields definition on the model for the fields we have added to the index.
  • get_as_listing_header is a method that renders a custom template that shows off the 'depth' of our Nodes. We also set the short_description and admin_order_field attributes on this method, used by modeladmin to show a nice column header.
  • get_parent is just the same method provided by MP_node. However, we need to re-declare it on the model to set the short_description used by modeladmin.
  • delete method is overridden to block the deletion of the root Node. This is really important — if it is deleted, the node tree will be corrupted and chaos will enter the ancient forest.
  • __str__ magic method is used to show a nice string representation of our Nodes.
  • Finally, we have decided that Node is not a friendly name for our team. We have elected to use Topic instead. modeladmin will also honor this reference and automatically use it in the admin interface.
# File: my_app/models.py
from django import forms
from django.core.exceptions import PermissionDenied
from django.core.validators import MinLengthValidator
from django.db import models
from django.template.loader import render_to_string # added

from treebeard.mp_tree import MP_Node

from wagtail.admin.edit_handlers import FieldPanel
from wagtail.search import index # added


class Node(index.Indexed, MP_Node):  # Note: Now using index.Indexed in model
    """Represents a single nestable Node in the corporate taxonomy."""

    # ...name, aliases and other attributes defined above go here

    def get_as_listing_header(self):
        """Build HTML representation of node with title & depth indication."""
        depth = self.get_depth()
        rendered = render_to_string(
            'includes/node_list_header.html',
            {
                'depth': depth,
                'depth_minus_1': depth - 1,
                'is_root': self.is_root(),
                'name': self.name,
            }
        )
        return rendered
    get_as_listing_header.short_description = 'Name'
    get_as_listing_header.admin_order_field = 'name'

    def get_parent(self, *args, **kwargs):
        """Duplicate of get_parent from treebeard API."""
        return super().get_parent(*args, **kwargs)
    get_parent.short_description = 'Parent'

    search_fields = [
        index.SearchField('name', partial_match=True),
        index.SearchField('aliases', partial_match=False, boost=0.25),
    ]

    def delete(self):
        """Prevent users from deleting the root node."""
        if self.is_root():
            raise PermissionDenied('Cannot delete root Topic.')
        else:
            super().delete()

    def __str__(self):
        return self.name

    class Meta:
        verbose_name = 'Topic'
        verbose_name_plural = 'Topics'

Here is the template used by our get_as_listing_header method.

{# File: my_app/templates/includes/node_list_header.html #}
{% if is_root %}
  <span style="font-size:135%;"><strong>{{ name }}</strong></span>
{% else %}
  <span>
    <span class="inline-block" style="margin-left:{{ depth_minus_1 }}em; font-size:{% if depth is 1 %}120{% elif depth is 2 %}110{% else %}100{% endif %}%;"></span>
    <i class="icon icon-fa-level-up icon-fa-rotate-90" style="display: inline-block;"></i>
    {{ name }}
  </span>
{% endif %}

Then we need to update the definition of our NodeAdmin to take advantage of our pretty get_as_listing_header method.

class NodeAdmin(ModelAdmin):
    #... other options
    # listing view options ('name' replaced with 'get_as_listing_header')
    list_display = ('get_as_listing_header', 'get_parent', 'aliases')

5 - Finishing up

We can now add a relation to our Nodes on any of our other models, where appropriate.

We can add a many-to-one relationship using ForeignKey.

KnowledgePage(Page):
    # ... other fields
    node = models.ForeignKey(
        'my_app.Node',
        on_delete=models.CASCADE,
    )

We can add a many-to-many relationship using ManyToManyField.

KnowledgePage(Page):
    # ... other fields
    nodes = models.ManyToManyField('my_app.Node')

We now have an interface to manage our taxonomy, along with a way to link the nodes to any other model within Django.

Bonus points - Adding icing on the root Node

Hide delete button on root Node

It is nice to not show buttons that users are not meant to use. Thankfully, modeladmin makes it easy to override how the buttons for each row are generated.

# File: my_app/models.py
from wagtail.contrib.modeladmin.helpers import ButtonHelper # add import

class NodeButtonHelper(ButtonHelper):
    def delete_button(self, pk, *args, **kwargs):
        """Ensure that the delete button is not shown for root node."""
        instance = self.model.objects.get(pk=pk)
        if instance.is_root():
            return
        return super().delete_button(pk, *args, **kwargs)

class NodeAdmin(ModelAdmin):
    #... other options
    button_helper_class = NodeButtonHelper

Add button to quickly add a child node

This is a bit more involved, but worth it to understand how to work with class-based views and modeladmin in depth.

Walkthrough:

  • NodeButtonHelper has a few changes to essentially create and insert a new button, add_child_button, which will provide a simple way to pre-fill the parent field on a create Node view.
  • AddChildNodeViewClass extends the CreateView class. Here, we do a few things:
    • __init__ gets the pk (primary key) from the request and checks it is valid via the prepared queryset and get_object_or_404.
    • get_page_title gives the user a nicer title on the create page, relevant to the parent they selected.
    • get_initial sets the initial values for our NodeForm. No changes are needed to NodeForm for this to work.
  • Inside our NodeAdmin, we override two methods:
    • add_child_view — this gives the modeladmin module a reference to a view to assign to the relevant URL.
    • get_admin_urls_for_registration — this adds our new URL for the above view to the registration process (Wagtail admin requires all admin URL patterns to be registered a specific way).
# File: my_app/models.py
from django.conf.urls import url
from django.contrib.admin.utils import quote, unquote
from django.shortcuts import get_object_or_404

from wagtail.contrib.modeladmin.helpers import ButtonHelper
from wagtail.contrib.modeladmin.views import CreateView


class NodeButtonHelper(ButtonHelper):
    # delete_button... see above

    def prepare_classnames(self, start=None, add=None, exclude=None):
        """Parse classname sets into final css classess list."""
        classnames = start or []
        classnames.extend(add or [])
        return self.finalise_classname(classnames, exclude or [])

    def add_child_button(self, pk, child_verbose_name, **kwargs):
        """Build a add child button, to easily add a child under node."""
        classnames = self.prepare_classnames(
            start=self.edit_button_classnames + ['icon', 'icon-plus'],
            add=kwargs.get('classnames_add'),
            exclude=kwargs.get('classnames_exclude')
        )
        return {
            'classname': classnames,
            'label': 'Add %s %s' % (
                child_verbose_name, self.verbose_name),
            'title': 'Add %s %s under this one' % (
                child_verbose_name, self.verbose_name),
            'url': self.url_helper.get_action_url('add_child', quote(pk)),
        }

    def get_buttons_for_obj(self, obj, exclude=None, *args, **kwargs):
        """Override the getting of buttons, prepending create child button."""
        buttons = super().get_buttons_for_obj(obj, *args, **kwargs)

        add_child_button = self.add_child_button(
            pk=getattr(obj, self.opts.pk.attname),
            child_verbose_name=getattr(obj, 'node_child_verbose_name'),
            **kwargs
        )
        buttons.append(add_child_button)

        return buttons

class AddChildNodeViewClass(CreateView):
    """View class that can take an additional URL param for parent id."""

    parent_pk = None
    parent_instance = None

    def __init__(self, model_admin, parent_pk):
        self.parent_pk = unquote(parent_pk)
        object_qs = model_admin.model._default_manager.get_queryset()
        object_qs = object_qs.filter(pk=self.parent_pk)
        self.parent_instance = get_object_or_404(object_qs)
        super().__init__(model_admin)

    def get_page_title(self):
        """Generate a title that explains you are adding a child."""
        title = super().get_page_title()
        return title + ' %s %s for %s' % (
            self.model.node_child_verbose_name,
            self.opts.verbose_name,
            self.parent_instance
        )

    def get_initial(self):
        """Set the selected parent field to the parent_pk."""
        return {'parent': self.parent_pk}


class NodeAdmin(ModelAdmin):
    #... other NodeAdmin attributes...

    def add_child_view(self, request, instance_pk):
        """Generate a class-based view to provide 'add child' functionality."""
        # instance_pk will become the default selected parent_pk
        kwargs = {'model_admin': self, 'parent_pk': instance_pk}
        view_class = AddChildNodeViewClass
        return view_class.as_view(**kwargs)(request)

    def get_admin_urls_for_registration(self):
        """Add the new url for add child page to the registered URLs."""
        urls = super().get_admin_urls_for_registration()
        add_child_url = url(
            self.url_helper.get_action_url_pattern('add_child'),
            self.add_child_view,
            name=self.url_helper.get_action_url_name('add_child')
        )
        return urls + (add_child_url, )

In closing

I really hope this has been helpful from both the technical and 'thinking it through' perspective.

There is a lot of room for improvement in this implementation, but this is a solid starting point. From here, you can build your own amazing taxonomy systems in every application... that needs it.

You can view the full models.py file on a GitHub gist. There are a few minor additions and tweaks based on the project I based this blog on.

Header Photo by Will Turner on Unsplash.

Discover and read more posts from LB (Ben Johnston)
get started
post commentsBe the first to share your opinion
Artem Kam
a year ago

Very useful article and clean implementation, found it in many places and wondering about one thing. Suppose i want to Node be able to have some personal url and content for it representation (for example instance of or reference to Wagtail Page)

What can you say about most correct approach to reach the goal?

André Bergonse
6 years ago

Great stuff!
I’m really curious to know what was that single sentence to explain “taxonomy” and what was the friendly term you replaced it with :)

LB (Ben Johnston)
6 years ago

Good question - we ended up simply calling them Topics (hence the class Meta: verbose_name = “Topic”). Our one liner was “Topics are what pages are about and are grouped into Subjects”. Seems simple but took a while to get there and also explained our nesting (which was limited to three levels deep: Root > Subject > Topic).

Xtian
6 years ago

Will you describe the differences of your Django Taxonomy solution to a popular knowledge base solution, Semantic-MediaWiki?

LB (Ben Johnston)
6 years ago

Django is a web framework so it leaves the implementation of essentially everything the user interacts with up to the developer. Whereas Semantic-MediaWiki is an extension to MediaWiki (the wiki platform that Wikipedia uses). Semantic-MediaWiki is a full implementation of semantic data on top of Wikipedia (basically storing triples of object-entity-value).

The solution implemented above could be used as just the ‘value’ part of the semantics, eg. “PageModel hasTopic TopicNode” or it could be used as the entity-value if you enforce some rules about the second level of the node tree being the ‘entity’.

It really depends on how far you want to go down the semantic rabbit hole and how much you want to work with protocols like RDF. The closest Django specific project

Show more replies