Convert XLS Files to XLSX Files using Python
Problem:
Sometimes applications will dump out Excel exports as XLS files and this can be an issue for users then looking to use those files as data sources in other upstream processes that they would like to build on top of those exports.
In order to get the files usable for applications such as Power BI as data sources for example, all of those XLS files need to be converted to XLSX files. This can take a lot of time to do manually. You would have to open each file one by one and then save them as an XLSX file.
In order to automate this process, we can use scripting languages such as Python. The script below converts all of the XLS files in an existing folder to XLSX files, also making sure to delete any previously existing XLSX files that were in the folder previously.
What you Need:
- Python
- Excel
Solution:
Defne the python libraries that you have imported needed for your script. These libraries include pyexcel which is aliased as p in our script, os, datetime, and glob.
Define the path of the folder where your excel files are stored.
Print all of the files remaining in the folder in the console and then save the files as XLSX using the save_book_as function from the pyexcel library.
Print all of the files that are XLSX already in the console and then delete those files from the folder.
Script:
import pyexcel as p
import os
from datetime import datetime
import glob
path_of_the_directory= r'C:\Users\prati\OneDrive\Documents\XLS Test'
print("Files and directories in a specified path:")
for f in os.listdir(path_of_the_directory):
if f.endswith('.xlsx'):
print(f)
os.remove(os.path.join(path_of_the_directory,f))
for filename in os.listdir(path_of_the_directory):
f1 = os.path.join(path_of_the_directory,filename)
if os.path.isfile(f1):
print(f1)
p.save_book_as(file_name= f1,
dest_file_name= f1.split('.')[0]+datetime.now().strftime("%A, %d %b %Y")+".xlsx")