Python Excel automation

#PythonAutomation

Python Excel automation

Overview

Traditionally, spreadsheets have been the dominant way of managing many business processes, And users can spend hours doing repetitive and boring tasks, Let's help them out.

There're loads of Tools out there that can automate your work (e.g. RPA tools, PowerShell scripts, Bash scripts, Python Scrips, etc..)

In this Section will figure out together the basic capabilities of Python automation, We gonna use openpyxl Library, However, There's a load of Modules, Packages, and libraries out there: Python Package Index "PyPi"

Prerequisites

Prerequisites to install openpyxl library

  • Install the latest version of python3:
sudo apt install python3

It's Okay if you're using another OS, Download python

  • install the latest version of pip package management:

      sudo apt install python3-pip
    

    Install pip

  • Install the library by the following command:

      pip install openpyxl
    
  • Discover the library packages

# Change directory under your project/site-packages, e.g. the below
/home/pythonuser/Documents/PythonAutomation_Excel/venv/lib/python3.8/site-packages/openpyxl

Let's discover openpyxl library

Open your Code Editor (e.g. PyCharm, VsCode, etc...,) Or Open your Terminal and type python3 to open IDLE "interactive development environment"

Let's start out with a very simple code:

import openpyxl

# Load the spreadsheet in a variable to use it later
wb = openpyxl.load_workbook('spreadSheet_sweigart.xlsx')

# Load the spreadsheet sheet names in a var to print it in the next step
sheet_list = wb.sheetnames

print(sheet_list)  # print sheets list

Now, let's specify a sheet to work on

  • Select a specific sheet

      # select a sheet
      sheet1 = wb['Sheet1']
    
      print(sheet1)  # print sheet one name
    
  • Or Select the active sheet (i.e. The active sheet that you're opening Or the last active one before closing the spreadsheet)

# select the active sheet
active_sheet = wb.active  # get the active sheet 

print(active_sheet)  # print the active sheet
  • Get a cell value:
cell_a1 = active_sheet['A1'] # Specify a cell, however this not giving the cell value must type .value

print(cell_a1.value)

print(cell_a1.row)  # get cell row number
  • Load the Sheet rows:
# max_row function definition to get all rows in the sheet not more
print(product_list.max_row) # will print the rows number, however you can loop on them using this funtion, and will see this in a moment
  • Create a sheet:
# create a sheet with "personal sheet" name
new_sheet = wb.create_sheet("personal sheet")
print(new_sheet)

# Rename the created sheet to "Per_sheet"
new_sheet.title = "per_sheet"
print(new_sheet)

# Save, Note that you can't save at the same file you must save at a new one.
# also, There's no change will be submitted if you didn't save
wb.save("spreadSheet_sweigart1.xlsx")

Script Example code

After discovering the tool's simplicity, let's type a little deep script, Find "spreadSheet.xlsx" here, The sheet has four columns (Product No, Inventory, Price, Supplier) we need to loop on each line and do the following:

  • Calculation number of products per supplier

  • Calculate the total inventory price value per supplier

  • Write values of total inventory price, and save the file

import openpyxl

# Load the spreadsheet
inv_file = openpyxl.load_workbook("spreadSheet.xlsx")
product_list = inv_file["Sheet1"] # specify a sheet

products_per_supplier = {} # create dictionary var for for loop
total_value_per_supplier = {} # create dictionary var for for loop
products_under_10_inv = {} # create dictionary var for for loop


# For loop, looping on each row one by one 
# we should use range here because of the for loop looping through list and the range convert it to list
# 2 here means to start from row number two to ignore the header, the range starts from 0
# 1 here means to get the last row line, by default range will ignore the last line
for product_row in range(2, product_list.max_row + 1):
    supplier_name = product_list.cell(product_row, 4).value  # Fetch the supplier value of each row, Product_row refers to the row that the for loop stopped on and 4 refers to the column. 
    # without .value function will fetch the cell details information, not its value.
    print(supplier_name)

    inventory = product_list.cell(product_row, 2).value  # Fetch the Inventory value of each row.
    price = product_list.cell(product_row, 3).value # Fetch the price value of each row.
    product_num = int(product_list.cell(product_row, 1).value) # Fetch the Product_num value of each row, and int here to print the values as integers 
    inventory_price = product_list.cell(product_row, 5)  # Select the Cell number 5 "the empty one" for each row to type the result in it. 

    # Calculation number of products per supplier
    if supplier_name in products_per_supplier: # This condition will be false at the first one for each supplier, while The else condition will be True, and will start counting for each supplier and save the result in products_per_supplier var
        current_num_products = products_per_supplier[supplier_name] # set supplier name as a key to products_per_supplier dictionary
        products_per_supplier[supplier_name] = current_num_products + 1 # append 1 on the existing value
        # Also we can do it like the below
        # products_per_supplier[supplier_name] = products_per_supplier[supplier_name] + 1
        # also you can use the following method to set a key to products_per_supplier, it's the same result
        # products_per_supplier.get(supplier_name) Instad of products_per_supplier[supplier_name]
    else:
        print("Adding a new supplier")
        products_per_supplier[supplier_name] = 1
        # products_per_supplier is a dictionary, here we put the dictionary key to be supplier name and the value = 1

    # Calculate total inventory price value per supplier
    if supplier_name in total_value_per_supplier:
        current_total_value = total_value_per_supplier[supplier_name]
        total_value_per_supplier[
            supplier_name] = current_total_value + inventory * price  # append to the existing value
    else:
        # enter the result value into this dictionary, 1st time per supplier
        print("Adding a new supplier value")
        total_value_per_supplier[supplier_name] = inventory * price # supplier name will be the key of dictionary

    # Print out the inventories that have less than 10
    if inventory < 10:
        products_under_10_inv[product_num] = int(inventory) # product_num will be the key of dictionary

    # Write values of total inventory price, and save the file
    # inventory_price equal the empty cell(num 5) to put the result value in it
    inventory_price.value = inventory * price  # use .value here to add to the cell
    #End of For loop

print(products_per_supplier)
print(total_value_per_supplier)
print(products_under_10_inv)

# saving the file, must save in another file
inv_file.save("inventory_with_total_values.xlsx")

References