Using Parts of Speech in Natural Language Processing to Automatically Generate Features from Raw Text

Photo by Chinh Le Duc on Unsplash

In the machine learning world, we have at some point, struggled with finding the right features for our machine learning models. Especially, when it comes to raw and unstructured text inputs or data sources. Customers have given us documents as potential datasets, leaving us perplexed. The very nature of the unstructured text has forced us to revise our work estimates, software design and the implementation — all leading to rework.

Processing of text documents and their conversion to features…

Photo by Hugo Rocha on Unsplash

I suggest you read Part 1 just in case you have reached here directly and unable to decipher what’s going on. You can download full implementation code from Git if that is your only interest, most welcome!

The code is documented inline and if you still have questions, please feel to reach me.

Code download: penredink/tds_feature_engineering_pos_tagger (

You do not need to know Git commands to work with the file. Just download the zip file and unzip.

Getting Started

I strongly recommend that you download the zip file and extract it as is. …

Shalabh Bhatnagar

Machine Learning. What else?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store