Comfort Code: phase 3 of a coding bootcamp
– A Machine that can Learn
I train. I increase my mileage by 10% every week, do pull-ups and my hands blister, becoming calloused as they heal, only to be blistered again with the next weeks’ series of sets. Repetition has slowly become an ally, making me better in body and mind. And so i think it is for my computer with Machine Learning. One just doesn’t stuff a bunch of data into a model and magically see the future, it is split up into a training pile and a testing pile.
The training, is done on around 80% of the data, leaving the remaining 20 % pristine to act as an accurate gauge on how well the Machine Learning (ML) model does making predictions on unseen data. That’s a great idea in my opinion, implemented by just a marvel of human ingenuity — Scikit-Learn.
Like Pandas, some brilliant people devoted their time and energy to making Scikit-Learn (SkLearn) so the rest of us shouldn’t suffer unduly. The packages associated with it are equally amazing in their power and ingenuity of application. Like internal combustion, one can find themselves taking SkLearn for granted. But I like to devote small slivers of time, standing in awe at its complexity and feeling thankful for the bright people who were so generous with their time and charitably made it free for me to access and utilize. It inspires me. Maybe one day I can give something to humankind as useful and intricate.
As phase 3 progressed, I delved deeper into the types of ML models at our disposal. It was an iterative process. First was Logistic Regression, then Decision Trees. From there, my attention turned to Random Forests. I noticed that one of our instructors was harping on the concept of Pipelines. I’m old enough to know when someone is telling you something for your own good. So Pipelines were now part of the ML process. It seems after two generations of plumbers, we can’t escaped piping — be it only virtual.
Combining the already-powerful Random Forest Model with a Pipeline seems like overkill in itself. But sit back, because we’re adding GridSearch to this recipe. GridSearch reveals how much is going on behind the scenes of Machine Learning, as evident by the wait time for it to execute. You set all the parameters that might be of use in you model, and your machine implements every possible combination. This can take hours. After it’s completed, it’s only a matter of few english letters and a couple characters and you have the best combination of parameters. Of course If you want, you can stack some models.
This is where dreams come true. Almost before you can finish the thought, “wouldn’t it be cool if you could feed one model into the next?!”, some intelligent person figured out a way, and included it in SkLearn. I stumbled upon an article on Toward Data Science that explained in a clear and easy manner how to make multi-layered models, it’s worth linking here: https://towardsdatascience.com/stacking-made-easy-with-sklearn-e27a0793c92b
This multi-layered model technique takes results from several models and applies them to another set of models which trains on that information. The method then feeds that into a final Logistic Regression model — Truly powerful stuff and I nerd over it to whomever will listen and be (even possibly) interested.
This phase’s project was exactly what I needed to grow as a Data Analyst/Scientist. It afforded me the pleasure of growing my EDA skills, improving upon my function writing, implementing Pipelines and GridSearch, and finally building, if I don’t say so myself, relatively complex Machine Learning Models.
Mere months ago, I thought of the idea of understanding and building ML models. Now I can. At present I dream of making Neural Networks. Undoubtedly, I will learn and get to know them too. What a world! Here’s to enjoying the learning process and the fruits of your labor.