Ponder with Pandas — Text to Excel and Feature Engineering
Of all the things we do in machine learning, changing data, transforming it and making it feature ready takes lots of time. Extracting from a database, a CSV file or am XML file is easy stuff as they all exhibit a schema and many libraries make our lives easy.
In a series of mini fragments, I will share code snippets that I hope make your machine learning tasks easier.
I make a point by implementing a real-world use case. If for some reason, you have not seen a use case I implemented, no worries. If and when you encounter a matching one, you can at least back.
All of this is of course free.
Applies to
Read any Text File and Converting it to a Spreadsheet Cell Representation in Microsoft Excel
Benefits:
1. Rapidly convert text to an Excel
2. Layout text as if they are structured data set = features
When to use
When you have loads of text files and want to flatten them into a structure at a character level.
import pandas as pd# Create a Dataframe from a text file named fox.txt and transpose it
# Please ensure that text file is in the same folder as this code or use the -
# paths as per your needs
df = pd.DataFrame.from_records(data=open(“fox.txt”,“rt”)
.read()).transpose()
# Write to Excel file and viola you are done
df.to_excel(“some.xlsx”, merge_cells=True, index=False)Disclaimer: All copyrights and trademarks belong to their respective companies and owners. The purpose of this article of educational only and the views herein are my own.