# Openpyxl tutorial: Handling Excel sheets in Python

Automate your Excel workflow by using Openpyxl module in Python 3.

** Microsoft Excel** is probably one of the highly used data storage applications. A huge proportion of small to medium size businesses fulfill their analytics requirement using Excel.

However, analyzing huge amount of data in Excel can become highly tedious and time-consuming. You could build customized data processing and analytics application using ** Visual Basic(VBA)**, the language that powers the Excel sheets. However, learning VBA could be difficult and perhaps, not worth it.

However, if you have a little knowledge of Python, you could build highly professional Business Intelligence using Excel data, without the need of a database. Using Python with Excel could be a game changer for your business.

## Sections Covered

- Basic Information about Excel
- What is Openpyxl and how to install it?
- Reading data from Excel in Python
- Reading multiple cells from Excel in Python
- Find the max row and column number of an Excel sheet in Python
- How to iterate over Excel rows and columns in Python?
- Create a new Excel file with Python
- Writing data to Excel in Python
- Appending data to Excel in Python
- Manipulating Excel Sheets in Python
- Practical usage example of data analysis of Excel sheets in Python

## Basic Information about Excel

Before beginning this Openpyxl tutorial, you need to keep the following details in mind.

- Excel files are called
.**Workbooks** - Each Workbook can contain
.**multiple sheets** - Every sheet consists of rows starting from
and columns starting from**1**.**A** - Rows and columns together make up a
.**cell** - Any type of data can be stored.

## What is Openpyxl and how to install it?

The Openpyxl module in Python is used to handle Excel files without involving third-party Microsoft application software. It is arguably, the best python excel library that allows you to perform various Excel operations and automate excel reports using Python. You can perform all kinds of tasks using Openpyxl like:-

- Reading data
- Writing data
- Editing Excel files
- Drawing graphs and charts
- Working with multiple sheets
- Sheet Styling etc.

**You can install Openpyxl module by typing pip install openpyxl in your command line.**

`pip install openpyxl`

## Reading data from Excel in Python

**To import an excel file in Python, use the load_workbook method from Openpyxl library.**

Let’s import an Excel file named `wb1.xlsx`

in Python using Openpyxl module. It has the following data as shown in the image below.

**Step 1 - **Import the `load_workbook`

method from Openpyxl.

```
from openpyxl import load_workbook
```

**Step 2 - **Provide the file location for the Excel file you want to open in Python.

` wb = load_workbook('wb1.xlsx')`

**If your Excel file is present in the same directory as the python file, you don't need to provide to entire file location.**

**Step 3 - **Choose the first active sheet present in the workbook using `wb.active`

attribute.

` sheet = wb.active`

The above points are a standard way of accessing Excel sheets using Python. You will see them being used multiple times through out this article.

**Let’s read all the data present in Row 1** **(header row).**

### Method 1 - Reading data through Excel cell name in Python

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
print(sheet["A1"].value)
print(sheet["B1"].value)
print(sheet["C1"].value)
print(sheet["D1"].value)
```

**Output**

```
ProductId
ProductName
Cost per Unit
Quantity
```

### Method 2 - Reading data from Excel using cell() method in Python

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
print(sheet.cell(row=1, column=1).value)
print(sheet.cell(row=1, column=2).value)
print(sheet.cell(row=1, column=3).value)
print(sheet.cell(row=1, column=4).value)
```

**Output**

```
ProductId
ProductName
Cost per Unit
Quantity
```

## Reading Multiple Cells from Excel in Python

You can also read multiple cells from an Excel workbook. Let’s understand this through various examples. Refer to the image of the `wb1.xlsx`

file above for clarity.

### Method 1 - Reading a range of cells in Excel using cell names

To read the data from a specific range of cells in your Excel sheet, you need to slice your sheet object through both the cells.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells from A1 to D11
print(sheet["A1:D11"])
```

**Output**

```
((<Cell 'Sheet1'.A1>, <Cell 'Sheet1'.B1>, <Cell 'Sheet1'.C1>, <Cell 'Sheet1'.D1>),
(<Cell 'Sheet1'.A2>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.C2>, <Cell 'Sheet1'.D2>),
(<Cell 'Sheet1'.A3>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.C3>, <Cell 'Sheet1'.D3>),
(<Cell 'Sheet1'.A4>, <Cell 'Sheet1'.B4>, <Cell 'Sheet1'.C4>, <Cell 'Sheet1'.D4>),
(<Cell 'Sheet1'.A5>, <Cell 'Sheet1'.B5>, <Cell 'Sheet1'.C5>, <Cell 'Sheet1'.D5>),
(<Cell 'Sheet1'.A6>, <Cell 'Sheet1'.B6>, <Cell 'Sheet1'.C6>, <Cell 'Sheet1'.D6>),
(<Cell 'Sheet1'.A7>, <Cell 'Sheet1'.B7>, <Cell 'Sheet1'.C7>, <Cell 'Sheet1'.D7>),
(<Cell 'Sheet1'.A8>, <Cell 'Sheet1'.B8>, <Cell 'Sheet1'.C8>, <Cell 'Sheet1'.D8>),
(<Cell 'Sheet1'.A9>, <Cell 'Sheet1'.B9>, <Cell 'Sheet1'.C9>, <Cell 'Sheet1'.D9>),
(<Cell 'Sheet1'.A10>, <Cell 'Sheet1'.B10>, <Cell 'Sheet1'.C10>, <Cell 'Sheet1'.D10>),
(<Cell 'Sheet1'.A11>, <Cell 'Sheet1'.B11>, <Cell 'Sheet1'.C11>, <Cell 'Sheet1'.D11>))
```

You can see that by slicing the `sheet`

data from `A1:D11`

, it returned us tuples of row data inside a tuple. In order to read the values of every cell returned, you can iterate over each row and use `.value`

.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells from A1 to D11
for row in sheet["A1:D11"]:
print ([x.value for x in row])
```

**Output**

```
['ProductId', 'ProductName', 'Cost per Unit', 'Quantity']
[1, 'Pencil', '$0.5', 200]
[2, 'Pen', '$1', 500]
[3, 'Eraser', '$0.25', 100]
[4, 'Sharpner', '$0.75', 100]
[5, 'Files', '$3', 50]
[6, 'A4 Size Paper', '$9', 10]
[7, 'Pencil Box', '$12', 20]
[8, 'Pen Stand', '$5.5', 10]
[9, 'Notebook', '$2', 50]
[10, 'Marker', '$1', 75]
```

### Method 2 - Reading a single row in Excel using cell name

To read a single row in your Excel sheet, just access the single row number from your `sheet`

object.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells in row 1
for data in sheet["1"]:
print(data.value)
```

**Output**

```
ProductId
ProductName
Cost per Unit
Quantity
```

### Method 3 - Reading all rows in Excel using rows attribute

To read all the rows, use `sheet.rows`

to iterate over rows with Openpyxl. You receive a tuple element per row by using the `sheet.rows`

attribute.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells in row 1
for row in sheet.rows:
print([data.value for data in row])
```

**Output**

```
['ProductId', 'ProductName', 'Cost per Unit', 'Quantity']
[1, 'Pencil', '$0.5', 200]
[2, 'Pen', '$1', 500]
[3, 'Eraser', '$0.25', 100]
[4, 'Sharpner', '$0.75', 100]
[5, 'Files', '$3', 50]
[6, 'A4 Size Paper', '$9', 10]
[7, 'Pencil Box', '$12', 20]
[8, 'Pen Stand', '$5.5', 10]
[9, 'Notebook', '$2', 50]
[10, 'Marker', '$1', 75]
```

### Method 4 - Reading a single column in Excel using cell name

Similar to reading a single row, you can read the data in a single column of your Excel sheet by its alphabet.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells in column A
for data in sheet["A"]:
print(data.value)
```

**Output**

```
ProductId
1
2
3
4
5
6
7
8
9
10
```

### Method 5 - Reading all the columns in Excel using columns attribute

To read all the data as a tuple of the columns in your Excel sheet, use `sheet.columns`

attribute to iterate over all columns with Openpyxl.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all columns
for col in sheet.columns:
print([data.value for data in col])
```

**Output**

```
['ProductId', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
['ProductName', 'Pencil', 'Pen', 'Eraser', 'Sharpner', 'Files', 'A4 Size Paper', 'Pencil Box', 'Pen Stand', 'Notebook', 'Marker']
['Cost per Unit', '$0.5', '$1', '$0.25', '$0.75', '$3', '$9', '$12', '$5.5', '$2', '$1']
['Quantity', 200, 500, 100, 100, 50, 10, 20, 10, 50, 75]
```

### Method 6 - Reading all the data in Excel

To read ** all the data** present in your Excel sheet, you don’t need to index the

`sheet`

object. You can just iterate over it.**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells in Excel
for row in sheet:
print([data.value for data in row])
```

**Output**

```
['ProductId', 'ProductName', 'Cost per Unit', 'Quantity']
[1, 'Pencil', '$0.5', 200]
[2, 'Pen', '$1', 500]
[3, 'Eraser', '$0.25', 100]
[4, 'Sharpner', '$0.75', 100]
[5, 'Files', '$3', 50]
[6, 'A4 Size Paper', '$9', 10]
[7, 'Pencil Box', '$12', 20]
[8, 'Pen Stand', '$5.5', 10]
[9, 'Notebook', '$2', 50]
[10, 'Marker', '$1', 75]
```

## Find the max row and column number of an Excel Sheet in Python

**To find the max row and column number from your Excel sheet in Python, use sheet.max_row and sheet.max_column attributes in Openpyxl.**

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
print(f"Max row in the active sheet is {sheet.max_row}")
print(f"Max column in the active sheet is {sheet.max_column}")
```

**Output**

```
Max row in the active sheet is 11
Max column in the active sheet is 4
```

** Note** - If you update a cell with a value, the

`sheet.max_row`

and `sheet.max_column`

values also change, even though you haven’t saved your changes.**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
sheet = wb.active
sheet["A1"].value = "Lenin"
print(sheet.max_row)
sheet["A2"].value = "Mishra"
print(sheet.max_row)
# wb.save('pylenin.xlsx')
```

**Output**

```
1
2
```

## How to iterate over Excel rows and columns in Python?

**Openpyxl offers two commonly used methods called iter_rows and iter_cols to iterate over Excel rows and columns in Python.**

`iter_rows()`

- Returns one tuple element.**per row selected**`iter_cols()`

- Returns one tuple element.**per column selected**

Both the above mentioned methods can receive the following arguments for setting boundaries for iteration:

- min_row
- max_row
- min_col
- max_col

### Example 1 - iter_rows()

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells from between
# "Row 1 and Row 2" and "Column 1 and Column 3"
for row in sheet.iter_rows(min_row=1,
max_row=2,
min_col=1,
max_col=3):
print([data.value for data in row])
```

**Output**

```
['ProductId', 'ProductName', 'Cost per Unit']
[1, 'Pencil', '$0.5']
```

As you can see, only the first 3 columns of the first 2 rows are returned. **The tuples are row based.**

**You can also choose to not pass in some or any arguments in iter_rows method.**

**Code****- Not passing min_col and max_col**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# min_col and max_col arguments are not provided
for row in sheet.iter_rows(min_row=1,
max_row=2):
print([data.value for data in row])
```

**Output**

```
['ProductId', 'ProductName', 'Cost per Unit', 'Quantity']
[1, 'Pencil', '$0.5', 200]
```

All the columns from the first 2 rows are being printed.

### Example 2 - iter_cols()

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Access all cells from A1 to D11
for row in sheet.iter_cols(min_row=1,
max_row=2,
min_col=1,
max_col=3):
print([data.value for data in row])
```

**Output**

```
['ProductId', 1]
['ProductName', 'Pencil']
['Cost per Unit', '$0.5']
```

**The tuples returned are column based on using iter_cols() method.**

**You can also choose to not pass in some or any arguments in iter_cols() method.**

**Code - Not passing any argument**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# min_col and max_col arguments are not provided
for row in sheet.iter_cols():
print([data.value for data in row])
```

**Output**

```
['ProductId', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
['ProductName', 'Pencil', 'Pen', 'Eraser', 'Sharpner', 'Files', 'A4 Size Paper', 'Pencil Box', 'Pen Stand', 'Notebook', 'Marker']
['Cost per Unit', '$0.5', '$1', '$0.25', '$0.75', '$3', '$9', '$12', '$5.5', '$2', '$1']
['Quantity', 200, 500, 100, 100, 50, 10, 20, 10, 50, 75]
```

## Create a new Excel file with Python

**To create a new Excel file in Python, you need to import the Workbook class from Openpyxl library.**

**Code**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A1'] = "Pylenin"
sheet['B1'] = "loves"
sheet['C1'] = "Python"
wb.save("pylenin.xlsx")
```

This should create a new Excel workbook called `pylenin.xlsx`

with the provided data.

## Writing data to Excel in Python

There are multiple ways to write data to an Excel file in Python.

### Method 1 - Writing data to Excel using cell names

**Code**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A1'] = "Pylenin"
sheet['B1'] = "loves"
sheet['C1'] = "Python"
wb.save("pylenin.xlsx")
```

**Output**

### Method 2 - Writing data to Excel using the cell() method

**Code**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet.cell(row=1, column=1).value = "Pylenin"
sheet.cell(row=1, column=2).value = "loves"
sheet.cell(row=1, column=3).value = "Python"
wb.save("pylenin.xlsx")
```

**Output**

### Method 3 - Writing data to Excel by iterating over rows

**Code - Example 1**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
for row in sheet["A1:D3"]:
row[0].value = "Pylenin"
row[1].value = "loves"
row[2].value = "Python"
wb.save("pylenin.xlsx")
```

**Output**

You can also use methods like `iter_rows()`

and `iter_cols()`

to write data to Excel.

**Code - Example 2 - using iter_rows() method**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
for row in sheet.iter_rows(min_row=1,
max_row=3,
min_col=1,
max_col=3):
row[0].value = "Pylenin"
row[1].value = "loves"
row[2].value = "Python"
wb.save("pylenin.xlsx")
```

**Output**

**Code - Example 3 - using iter_cols() method**

```
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
for col in sheet.iter_cols(min_row=1,
max_row=3,
min_col=1,
max_col=3):
col[0].value = "Pylenin"
col[1].value = "loves"
col[2].value = "Python"
wb.save("pylenin.xlsx")
```

**Output**

## Appending data to Excel in Python

Openpyxl provides an `append()`

method, which is used to append values to an existing Excel sheet in Python.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
sheet = wb.active
data = (
("Pylenin", "likes", "icecream"),
("Pylenin", "likes", "Cricket")
)
for row in data:
sheet.append(row)
wb.save('pylenin.xlsx')
```

**Output**

## Manipulating Excel Sheets in Python

Each Excel workbook can contain multiple sheets. **To get a list of all the sheet names in an Excel workbook, you can use the wb.sheetnames.**

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
print(wb.sheetnames)
```

**Output**

`['Sheet']`

As you can see, `pylenin.xlsx`

has only one sheet.

**To create a new sheet in Python, use the create_sheet() method from the Openpyxl library.**

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
wb.create_sheet('Pylenin')
wb.save('pylenin.xlsx')
```

**Output**

You can also create sheets at different positions in the Excel Workbook.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
# insert sheet at 2nd to last position
wb.create_sheet('Lenin Mishra', -1)
wb.save('pylenin.xlsx')
```

**Output**

If your Excel workbook contains multiple sheets and you want to work with a particular sheet, you can refer the title of that sheet in your workbook object.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('pylenin.xlsx')
ws = wb["Pylenin"]
ws["A1"].value = "Pylenin"
ws["A2"].value = "loves"
ws["A3"].value = "Python"
wb.save('pylenin.xlsx')
```

**Output**

## Practical usage example of data analysis of Excel sheets in Python

Let’s perform some data analysis with `wb1.xlsx`

file as shown in the first image.

### Objective

- Add a new column showing
`Total Price per Product`

. - Calculate the
`Total Cost`

of all the items bought.

The resulting Excel sheet should look like the below image.

### Step 1 - Find the max row and max column of the Excel sheet

As mentioned before, you can use the `sheet.max_row`

and `sheet.max_column`

attributes to find the max row and max column for any Excel sheet with Openpyxl.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
print(f"Max row in the active sheet is {sheet.max_row}")
print(f"Max column in the active sheet is {sheet.max_column}")
```

**Output**

```
Max row in the active sheet is 11
Max column in the active sheet is 4
```

### Step 2 - Add an extra column in Excel with Python

To add an extra column in the active Excel sheet, with calculations, you need to first create a new column header in the first empty cell and then iterate over all rows to multiply `Quantity`

with `Cost per Unit`

.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Add new column header
sheet.cell(row=1, column=sheet.max_column+1).value = "Total Price per Product"
wb.save("wb1.xlsx")
```

**Output**

Now that an extra column header has been created, the `sheet.max_column`

value will change to 5.

Now you can calculate the `Total Price per Product`

using `iter_rows()`

method.

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
# Calculate Total Price per Product
for id, row in enumerate(sheet.iter_rows(min_row=2,
max_row = sheet.max_row)):
row_number = id + 2 # index for enumerate will start at 0
product_name = row[1].value
cost_per_unit = row[2].value
quantity = row[3].value
print(f"Total cost for {product_name} is {cost_per_unit*quantity}")
# Update cell value in the last column
current_cell = sheet.cell(row=row_number, column=sheet.max_column)
current_cell.value = cost_per_unit*quantity
# Format cell from number to $ currency
current_cell.number_format = '$#,#0.0'
print("\nSuccesfully updated Excel")
wb.save('wb1.xlsx')
```

**Output**

```
Total cost for Pencil is 100.0
Total cost for Pen is 500
Total cost for Eraser is 25.0
Total cost for Sharpner is 75.0
Total cost for Files is 150
Total cost for A4 Size Paper is 90
Total cost for Pencil Box is 240
Total cost for Pen Stand is 55.0
Total cost for Notebook is 100
Total cost for Marker is 75
Succesfully updated Excel
```

### Step 3 - Calculate sum of a column in Excel with Python

The last step is to calculate the `Total Cost`

of the last column in the Excel file.

**Access the last column and add up all the cost.**

You can read the last column by accessing the `sheet.columns`

attribute. Since it returns a generator, you first convert it to a `python list`

and access the last column.

```
last_column_data = list(sheet.columns)[-1]
# Ignore header cell
total_cost = sum([x.value for x in last_column_data[1:]])
```

**Create a new row 2 places down from the max_row and fill in Total Cost.**

```
max_row = sheet.max_row
total_cost_descr_cell = sheet.cell(row = max_row +2, column = sheet.max_column -1)
total_cost_descr_cell.value = "Total Cost"
total_cost_cell = sheet.cell(row = max_row +2, column = sheet.max_column)
total_cost_cell.value = total_cost
```

**Import Font class from openpyxl.styles to make the last row Bold.**

```
# Import the Font class from Openpyxl
from openpyxl.styles import Font
bold_font = Font(bold=True)
total_cost_descr_cell.font = bold_font
total_cost_cell.font = bold_font
total_cost_cell.number_format = "$#,#0.0"
```

### Final Code

**Code**

```
from openpyxl import load_workbook
wb = load_workbook('wb1.xlsx')
sheet = wb.active
last_column_data = list(sheet.columns)[-1]
# Ignore header cell
total_cost = sum([x.value for x in last_column_data[1:]])
max_row = sheet.max_row
total_cost_descr_cell = sheet.cell(row = max_row + 2, column = sheet.max_column -1)
total_cost_descr_cell.value = "Total Cost"
total_cost_cell = sheet.cell(row = max_row + 2, column = sheet.max_column)
total_cost_cell.value = total_cost
# Import the Font class from Openpyxl
from openpyxl.styles import Font
bold_font = Font(bold=True)
total_cost_descr_cell.font = bold_font
total_cost_cell.font = bold_font
total_cost_cell.number_format = "$#,#0.0"
print("\nSuccesfully updated Excel")
wb.save('wb1.xlsx')
```

When you run the above code, you should see all the relevant updates to your Excel sheet.