Data analysis from a CSV file in Python

Learn different ways to read and write CSV files in Python.

Data analysis from a CSV file in Python

Topics Covered

  1. What is a CSV file?
  2. How to read CSV files using the csv module?
  3. How to skip the header row in CSV with Python?
  4. How to read CSV files as a dictionary in Python?
  5. How to write to CSV files using the csv module?
  6. How to use the delimiter parameter in csv.writer?
  7. How to write a dictionary to a CSV file in Python?

What is a CSV file?

CSV stands for comma separated value.

You might have come across this file format while downloading data from an excel spreadsheet or a database. CSV files are convenient for storing tabular data.

It should be clear from the name that values in a CSV file are separated by a comma(by default).

Below is an example of CSV file containing information about a family.

my_family.csv

name,age,height(cm),weight(kg)
Lenin,30,188,90
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46

Usually the first line in a CSV file is called the Header which identifies the column name and data type. Every row after the header is a data record.

From the above example, you can see that each value(whether part of header or a data record) is separated by a comma. This separator character is called a Delimiter. A CSV file may use other delimiters other than comma.

Examples of other delimiter -

  1. tab \t
  2. colon :
  3. semi colon ;
  4. pipe |

In this article, you will learn to work with CSV files using the csv module and the pandas library.


How to read CSV files using the csv module?

Reading from a CSV file is done with the csv.reader object. You can open the CSV file as a text file with Python’s built-in open() function.


Example 1

import csv

with open('my_family.csv') as input:
    csv_reader = csv.reader(input, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Header row - {", ".join(row)}')
            line_count += 1
        else:
            print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')
            line_count += 1
    print(f'Total: {line_count} lines')

Output

Header row - name,  age,  height(cm),  weight(kg)
Lenin is  30 years old,  188 cm tall and  90 kg heavy
Phil is  42 years old,  178 cm tall and  76 kg heavy
Claire is  40 years old,  165 cm tall and  54 kg heavy
Alex is  18 years old,  140 cm tall and  46 kg heavy
Total: 5 lines

Since the first row is the header row(line_count will be 0), it is treated differently. You can also skip the header row while reading the CSV.


How to skip the header row in CSV with Python?


Since csv.reader object is an iterable, you can use next(reader object, None)function to return the header row and skip over it.

Example

import csv

with open('my_family.csv') as input:
    csv_reader = csv.reader(input, delimiter=',')
    line_count = 0
    next(csv_reader, None) #ignore the header
    for row in csv_reader:
        print(f'{row[0]} is {row[1]} years old, {row[2]} cm tall and {row[3]} kg heavy')
        line_count += 1
    print(f'Total: {line_count} lines')

Output

Lenin is  30 years old,  188 cm tall and  90 kg heavy
Phil is  42 years old,  178 cm tall and  76 kg heavy
Claire is  40 years old,  165 cm tall and  54 kg heavy
Alex is  18 years old,  140 cm tall and  46 kg heavy
Total: 4 lines

How to read CSV files as a dictionary?

You can read the CSV file as a dictionary by using the csv.DictReader object.

An advantage of using the DictReader object is that it turns each row into a dictionary which make accessing the fields a little more easier.

Example

import csv

with open('my_family.csv') as input:
    csv_reader = csv.DictReader(input, delimiter=',')
    for row in csv_reader:
        print(f'{row["name"]} is {row["age"]} years old, {row["height(cm)"]} cm tall and {row["weight(kg)"]} kg heavy')
    print(f'Total: {csv_reader.line_num} lines')

The csv_reader.line_num method returns the total number of lines in the CSV file.

For the csv.DictReader object, Python uses the column names as key from the header row.

The csv.DictReader object doesn't have the header row in it.


How to write to CSV files using the csv module?



You can write to a CSV file using the csv.writer object. Be careful to open the file in writing mode.

Example

import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [ ['Phil', 42, 178, 76],
        ['Alex', 18, 140, 46],
        ['Claire', 40, 165, 54] ]

filename = "my_family.csv"

with open(filename, 'w') as output:
    csvwriter = csv.writer(output)

    # Write a single list
    csvwriter.writerow(header)

    # Writing a list of lists
    csvwriter.writerows(data)

Output

Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Alex,18,140,46
Claire,40,165,54

The writerow method is going to write a list of values into a single row whereas writerows is going to write multiple rows from a buffer that contains one or more lists.


How to use the delimiter parameter in csv.writer?

Notice that no delimiter has been mentioned while creating the csv.writer object. In such cases, comma , is used as the default delimiter. You can also use a different delimiter by passing the delimiter parameter.

Example

import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [ ['Phil', 42, 178, 76],
        ['Alex', 18, 140, 46],
        ['Claire', 40, 165, 54] ]

filename = "my_family.csv"

with open(filename, 'w') as output:
    csvwriter = csv.writer(output, delimiter = '|')

    # Write a single list
    csvwriter.writerow(header)

    # Writing a list of lists
    csvwriter.writerows(data)

Output

Name|Age|Height(cm)|Weight(kg)
Phil|42|178|76
Alex|18|140|46
Claire|40|165|54

How to write a dictionary to a CSV file in Python?


You can write the dictionary into a CSV file using the DictWriter method. The fieldnames parameter is compulsory for passing the header information.

Example

import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [
    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},
    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},
    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}
]

filename = "my_family.csv"

with open(filename, 'w') as output:
  csvwriter = csv.DictWriter(output, fieldnames=header)
  csvwriter.writeheader()
  for row in data:
    csvwriter.writerow(row)

Output

Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46

You can also use writerows to write all the dictionaries to the CSV file at once.


Example

import csv

header = ['Name', 'Age', 'Height(cm)', 'Weight(kg)']

data = [
    {"Name":"Phil", "Age": 42, "Height(cm)":178, "Weight(kg)":76},
    {"Name":"Claire", "Age": 40, "Height(cm)":165, "Weight(kg)":54},
    {"Name":"Alex", "Age": 18, "Height(cm)":140, "Weight(kg)":46}
]

filename = "my_family.csv"

with open(filename, 'w') as output:
  csvwriter = csv.DictWriter(output, fieldnames=header)
  csvwriter.writeheader()
  csvwriter.writerows(data)

Output

Name,Age,Height(cm),Weight(kg)
Phil,42,178,76
Claire,40,165,54
Alex,18,140,46

Subscribe to Pylenin

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe