Codementor Events

Python interview question: tuple vs list

Published Mar 29, 2019Last updated Mar 31, 2019
Python interview question: tuple vs list

Let me start the series of “Python interview question” posts with this pretty common question. Tuples and lists are 2 seemingly similar sequence types in Python. And I love this question, because the depth of the answer is a good indicator of the candidate’s experience.

Literal syntax. The most evident way to get required data structure is to call the appropriate type —  tuple or list. And you could also use syntactic sugar to construct either a list (using square brackets and commas to declare elements) or a tule (using commas to declare elements and optionally parenthesis to reduce ambiguity).

Mutability. Tuples are immutable, while lists are mutable. This point is the base for following ones.

Memory usage. Due to mutability, you need more memory for lists and less memory for tuples.

Extending. You can add a new element to both tuples and lists with the only difference that the id of the tuple will be changed (i.e., we’ll have new object).

Hashing. Tuples are hashable and lists are not. It means that you can use a tuple as a key in a dictionary.

Semantics. This point is more about best practice. You should use tuples as heterogeneous data structures, while lists are homogeneus sequences.

Links:

  1. “What’s the difference between lists and tuples?” question on SO;
  2. Understanding tuples vs. lists in Python.
Discover and read more posts from Dmitry Belaventsev
get started
post commentsBe the first to share your opinion
Luke Plant
5 years ago

I’d argue that number 5, semantics, really ought to be number 1. Once you understand what a tuple means, you’ll understand the reason for all the rest, and be able to choose the right one.

People are often distracted by the low level performance features of these data structures. So you see things like “use tuples because they are faster” (for iterating over), or because they use less memory etc. This can turn into a proliferation of tuples (for consistency and/or compatibility with other code that is expecting tuples). And then you start to have real performance problems:

You cannot actually extend a tuple at all - they are immutable. As you say “extending” a tuple is actually creating an entirely new tuple, with all the elements of the original one copied over (by reference). However, if you ever want to “extend” a tuple, you’ve got the wrong data-structure - semantically, a tuple is a structure with a fixed number of elements, each of which has a different meaning e.g. (name, email address) pair. It never makes any sense to extend a tuple, and that is why it is immutable.

So what does “extending” a tuple look like? Here is an example:

my_stuff = ()
for thing in things:
    my_stuff += (thing.stuff,)

I have seen code like this in the wild - do NOT write code like this! The third line is not adding a new element to the tuple - you are in fact creating an entirely new tuple, which has to copy the elements of the old one, and because of this you end up with quadratic time complexity in the length of things, which is very bad.

If you had used a list, as Guido intended, then you would be able to use list.append, which has excellent performance characteristics for this use case.

The lesson is to just use the data structure that semantically matches what you have. Everything else will follow, because the performance characteristics of these data structures, and all the other language features and stdlib behaviour, have been optimized for their intended usage (e.g. there is such a thing as a list comprehension, but no tuple comprehension).

Donald Knuth is extremely relevant:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

So we should worry about O(n²) performance (which is fixed by using the right data structure), and should not worry about tiny differences in memory/speed (which might be very different on other Python implementations, like PyPy for example).

Dominik Seifert
5 years ago

I would argue that mutability has nothing to do with memory usage. Can you explain how one affects the other?

David Tippett
5 years ago

This is a really nice guide honestly. I feel like it covers all the aspects of these data types really well mate.

Dmitry Belaventsev
5 years ago

Pleased to hear that , David. Hope that will help someone during interview preparation process.

Show more replies