Codementor Events

Effective Kotlin: Use Sequence for bigger collections with more than one processing step

Published May 21, 2018
Effective Kotlin: Use Sequence for bigger collections with more than one processing step

Originally published on: blog.kotlin-academy.com by Marcin Moskała

People often miss the difference between Iterable and Sequence. It is explainable. Especially when you compare Iterable and Sequence definition:

interface Iterable<out T> {
    operator fun iterator(): Iterator<T>
}

interface Sequence<out T> {
    operator fun iterator(): Iterator<T>
}

You can say that the only formal difference between them is the name. But Iterable and Sequence are associated with totally different usage so nearly all their processing functions work in a different way.

Sequences are lazy, so intermediate functions for Sequence processing don’t do any calculations. Instead they return new Sequence that decorates the previous one with new operation. All these computations are evaluated during terminal operation like toList or count. On the other hand, functions for Iterable processing returns a new collection.

fun main(args: Array<String>) {
    val seq = sequenceOf(1,2,3)
    print(seq.filter { it % 2 == 1 }) 
    // Prints: kotlin.sequences.FilteringSequence@XXXXXXXX

    print(seq.filter { it % 2 == 1 }.toList()) // Prints: [1, 3]

    val list = listOf(1,2,3)
    print(list.filter { it % 2 == 1 }) // Prints: [1, 3]}

Sequence filter is an intermediate operation, so it doesn’t do any calculations, but instead it decorates sequence with new processing step. Calculations are done in terminal operation like toList.

Because of that, also ordering of operation processing is different. In sequence processing we generally make whole processing for a single element, then for another etc. In Iterable processing, we process the whole collection on every step.

sequenceOf(1,2,3)
        .filter { println("Filter $it, "); it % 2 == 1 } 
        .map { println("Map $it, "); it * 2 } 
        .toList()
// Prints: Filter 1, Map 1, Filter 2, Filter 3, Map 3,

listOf(1,2,3)
        .filter { println("Filter $it, "); it % 2 == 1 } 
        .map { println("Map $it, "); it * 2 }
// Prints: Filter 1, Filter 2, Filter 3, Map 1, Map 3,

Thanks to this laziness and per-element processing order, we can make infinitive Sequence.

generateSequence(1) { it + 1 } 
        .map { it * 2 } 
        .take(10)
        .forEach(::print)
// Prints: 2468101214161820

This isn’t supposed to be new for Kotlin developer with some experience, but there is one more important fact about sequences that is not mentioned in most articles or books: Sequences are more efficient for collection processing with more than single processing step.

What do I mean by more than one processing step? I mean more than single function for collection processing. So if you compare this two functions:

fun singleStepListProcessing(): List<Product> {
    return productsList.filter { it.bought }
}

fun singleStepSequenceProcessing(): List<Product> {
    return productsList.asSequence()
            .filter { it.bought } 
            .toList()
}

You will notice that there is nearly no difference in performance or simple list processing is faster (because it isinline). Although then you compare function with more than one processing steps, like twoStepListProcessing which uses filter and then map, the difference will be visible.

fun twoStepListProcessing(): List<Double> {
    return productsList 
            .filter { it.bought } 
            .map { it.price }
}

fun twoStepSequenceProcessing(): List<Double> {
    return productsList.asSequence()
            .filter { it.bought } 
            .map { it.price } 
            .toList()
}

fun threeStepListProcessing(): Double {
    return productsList 
            .filter { it.bought } 
            .map { it.price } 
            .average()
}

fun threeStepSequenceProcessing(): Double {
    return productsList.asSequence()
            .filter { it.bought } 
            .map { it.price } 
            .average()

How important is that? Let’s see average operation times from benchmark measurements:

twoStepListProcessing 81 095 ns/op
twoStepSequenceProcessing 55 685 ns/op

twoStepListProcessingAndAcumulate 83 307 ns/op
twoStepSequenceProcessingAndAcumulate 6 928 ns/op

Two-step processing is already visibly faster when we use Sequences. In this case improvement is around 30%.

This difference grows more when we accumulate to number instead of to some list. In such case, no intermediate collection needs to be created at all.

What about some typical real-life processing? Let’s say that we need to calculate the average price of products bought by adults:

fun productsListProcessing(): Double {
    return clientsList 
           .filter { it.adult } 
           .flatMap { it.products } 
           .filter { it.bought } 
           .map { it.price } .average()
}

fun productsSequenceProcessing(): Double {
    return clientsList.asSequence()
            .filter { it.adult } 
            .flatMap { it.products.asSequence() } 
            .filter { it.bought } 
            .map { it.price } 
            .average()
}

Here are results:

SequencesBenchmark.productsListProcessing 712 434 ns/op
SequencesBenchmark.productsSequenceProcessing 572 012 ns/op

We have around 20% improvement. This is less than for direct processing (without flatMap) but is still an important difference.

This shows general rule that can be found again and again when you measure performance:

Sequence processing is generally faster than direct collection processing when we have more then one processing step.

When are Sequences not faster?

There are some rare operations where we don’t profit from Sequence usage because we have to operate on the whole collection ether way. sorted is an example from Kotlin stdlib (I believe it is the only function in Kotlin stdlib that has this problem).

sorted uses optimal implementation: It accumulates Sequence into List and then uses sort from Java stdlib. A disadvantage is this accumulation process takes additional time if we compare to the same processing on Collection (although if Iterable is not a Collection or array then the difference is not significant because it needs to be accumulated as well).

It is controversial if Sequence should have methods like sorted because it is only palatially lazy (evaluated when we need to get the first element) and don’t work on infinitive sequences. It was added because it is a popular function and it is much easier to use it this way. Although Kotlin developer should remember that is cannot be used on infinitive sequences.

generateSequence(0) { it + 1 }.sorted().take(10).toList()
// Infinitive calculation time

generateSequence(0) { it + 1 }.take(10).sorted().toList()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

sorted is a rare example of processing step which is faster on Collection then on Sequence. Still, when we do few processing steps and single sorted function (or other function that needs to work on the whole collection), we can expect performance improvement from migration to Sequence processing.

// Took around 150 482 ns
fun productsSortAndProcessingList(): Double {
    return productsList 
               .sortedBy { it.price } 
               .filter { it.bought } 
               .map { it.price } .average()
}

// Took around 96 811 ns
fun productsSortAndProcessingSequence(): Double {
    return productsList.asSequence()
            .sortedBy { it.price } 
            .filter { it.bought } 
            .map { it.price } 
            .average()
}

What about Java stream?

Java 8 introduced streams to allow collection processing. They act and look similar to Kotlin sequences.

productsList.asSequence()
    .filter { it.bought } 
    .map { it.price } 
    .average()

productsList.stream()
    .filter { it.bought } 
    .mapToDouble { it.price } 
    .average()
    .orElse(0.0)

They are also lazy and collected in last (terminal) processing step. Java streams are also more effective for collection processing then direct Kotlin collection processing functions. Two big differences between Java steams and Kotlin sequences are following:

  • Kotlin sequences have much more processing functions (because they are defined as an extension functions) and their usage is generally easier (this is a result of the fact that Kotlin sequences were designed when Java streams was already used — for instance we can collect using toList() instead of collect(Collectors.toList()))
  • Java streams can be started in parallel mode using a parallel function. This can give us huge performance improvement in contexts when we have a machine with multiple cores (what is a standard today).
  • Kotlin sequences can be used in common modules, Kotlin/JS and Kotlin/Native modules.

Except that, when we don’t use parallel mode, it is hard to give a simple answer if Java steam or Kotlin sequence is more efficient.

My suggestion is to use Java streams only for computationally heavy processing where you can profit from the parallel mode. Otherwise, use Kotlin stdlib functions to have a homogeneous and clean code.

Effective Kotlin

This is the first article about Effective Kotlin. When we see interest, we will publish next parts. In Kotlin Academy we also work on the book about this subject:

Effective Kotlin
_This book presents deep analysis of good practices, both official (Kotlin and Google best practices for Kotlin) and…_leanpub.com

It will cover a much wider range of topics and go much deeper into every single one of them. It will include also best practices published by Kotlin and Google team, experiences of members of Kotlin team we cooperate with, and subjects touched in “Effective Java in Kotlin” series. To support it and make us publish it faster, use this link and subscribe.


If you need a help with Kotlin, remember that I give consultations.

To be up-to-date with great news on Kotlin Academy, subscribe to the newsletter, observe Twitter and follow.

To reference me on Twitter, use @MarcinMoskala. Use below link to subscribe to the newsletter:

Discover and read more posts from Kotlin Academy
get started
post commentsBe the first to share your opinion
Show more replies