Fetching huge datasets using iterator protocol

Published Dec 07, 2017
Fetching huge datasets using iterator protocol

Start writing hereSome times you want to retrieve a huge dataset and iterate/loop over it to perform some operation. Fetching them all into memory at once can lock up the server processes. To avoid such problems and to iterate over the dataset you may use pythons Iterator Protocol. Read the following to have an better understanding about iterators,

Python official documentation about iterators
Python practise book

While iterating over the dataset, actually python is calling the __next__ method of the iterable. So if you want to fetch a huge dataset you can do it in several small batches/chunks with the help of __next__. First fetch a small portion of the dataset and when the end of the small portion is reached, fetch the next small section. Like that you can fetch and process the entire dataset without any complexity.

The advantage of using Iterator Protocol is programmer can interpret the iteration as a single loop. He should not worry about fetching results in small chunks, Its been taken care by the Iterator.

#! /usr/bin/python
 
 
class QueryIterator(object):
    query = None
    results = None
 
    def __init__(self, query=None):
        self.query = query
 
    def __iter__(self):
        return self
 
    def next(self):
        try:
            """
            Logic to return next entry in self.results
            """
            pass
        except StopIteration:
            """
            Logic to populate results again ( eg: call populate_date() )
            and return the next entry in self.results
            """
            pass
 
    def populate_data(self):
        """
        Logic to execute query in small batches/chunks
        and store results in self.results
        """
        pass...
Discover and read more posts from Akhil Lawrence
get started
Enjoy this post?

Leave a like and comment for Akhil

1