PART 1 — BACKGROUND AND BASICS
Using Parts of Speech in Natural Language Processing to Automatically Generate Features from Raw Text
In the machine learning world, we have at some point, struggled with finding the right features for our machine learning models. Especially, when it comes to raw and unstructured text inputs or data sources. Customers have given us documents as potential datasets, leaving us perplexed. The very nature of the unstructured text has forced us to revise our work estimates, software design and the implementation — all leading to rework.
Processing of text documents and their conversion to features…
I suggest you read Part 1 just in case you have reached here directly and unable to decipher what’s going on. You can download full implementation code from Git if that is your only interest, most welcome!
The code is documented inline and if you still have questions, please feel to reach me.
Code download: penredink/tds_feature_engineering_pos_tagger (github.com)
You do not need to know Git commands to work with the file. Just download the zip file and unzip.
I strongly recommend that you download the zip file and extract it as is. …
Machine Learning. What else?