How efficient are Python generators?
Assess the performance of generators compared to normal functions using the resource module.

Topics Covered
- Introduction
- Where should you use Python generators IRL
- Python resource library
- Experimenting with return
- Experimenting with yield
- Plotting out the results in Matplotlib
Introduction
In my previous article on Python generators, I mentioned the following 3 advantages of using Python generators.
- Easier to build iterators using generators.
- They are memory-efficient since they produce one item at a time.
- They can represent an infinite stream of data
A lot of my readers had the following comment.
Good job! What I miss is real life examples? When to use what? And why?
So, in this article, I will be addressing the below points.
- A real-life example of using Python generators.
- How fast/efficient are generators compared to using
return
?
Where should you use Python generators IRL?
One of the most popular applications of using the generator function is to read a file containing large volumes of data.
Generators perform Lazy evaluation. They compute the value of each item when you ask for it and not during the time of instantiation.
This makes generators very useful when you have a very large data set to compute. You can start using the data immediately, while the whole data set is being computed.
We are going to conduct the following experiment.
- We will be using 2 datasets. The first file has 100 rows in it whereas the second file has 5 million rows in it.
2. The first program will read all the rows into a list and then return it. For both the files, we will calculate the time it takes and the memory it consumes.
3. The second program will use yield to read one line at a time and return it when asked by the program for printing. We are again going to calculate the time it takes and the memory it consumes for using generators on both the files.
Python resource library
The resource
module is a UNIX package and won't work with the Windows system.
This module provides basic mechanisms for measuring and controlling system resources utilized by a program.
We would specifically be using resource.getrusage
function.
This function returns an object that describes the resources consumed by either the current process or its children, as specified by the who
parameter. We will be using resource.RUSAGE_SELF
symbol. This will provide the resources consumed by the calling process, which is the sum of resources used by all threads in the process.
Experimenting with return
The following code reads the entire file, stores it in memory, and prints every line of data inside a loop.
We will run the below code for the file containing only 100 rows and the file containing 5 million rows.
import resource
filename = '<filename>'
def read_file(file_name):
csv_file = open(file_name, 'r')
data = csv_file.readlines()
csv_file.close()
return data
csv_data = read_file(filename)
for data in csv_data:
print(data)
print('Peak Memory Usage =', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
print('User Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_utime)
print('System Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_stime)
Results for the file containing 100 rows
Peak Memory Usage = 9376
User Mode Time = 0.040309
System Mode Time = 0.012093999999999999
Results for the file containing 5 million rows
Peak Memory Usage = 943516
User Mode Time = 10.662542
System Mode Time = 14.784168
Experimenting with yield
The following code uses yield
keyword to read one line at a time and return it to the caller.
We will run the below code for the file containing only 100 rows and the file containing 5 million rows.
import resource
filename = '<filename>'
def read_file(file_name):
csv_file = open(file_name, 'r')
while True:
data = csv_file.readline()
if not data:
csv_file.close()
break
yield data
data = read_file(filename)
for row in data:
print(row)
print('Peak Memory Usage =', resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
print('User Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_utime)
print('System Mode Time =', resource.getrusage(resource.RUSAGE_SELF).ru_stime)
Results for the file containing 100 rows
Peak Memory Usage = 9424
User Mode Time = 0.016108
System Mode Time = 0.008058
Results for the file containing 5 million rows
Peak Memory Usage = 9436
User Mode Time = 11.590708
System Mode Time = 14.579287
Plotting out the results in Matplotlib
Let's summarize the results in a table format and then plot them in Matplotlib.
No. of rows | Return statement | Yield statement |
---|---|---|
100 rows | Memory: 9376 bytes, Time:0.0523 secs | Memory: 9424 bytes, Time:0.016 secs |
5 million rows | Memory: 943516 bytes, Time: 25.4 secs | Memory: 9436 bytes, Time:26.1 secs |

While both return
and yield
statements are performing similarly on the time front, it is in the area of memory consumption where the generators really outshine the return statement.