profile

Learn Data Science from Data School 📊

Tuesday Tip #1: Speed up your grid search 🔎

Published about 1 year ago • 1 min read

Hi Reader!

Welcome to the first issue of “Tuesday Tips,” a new series in which I’ll share a data science tip with you every Tuesday!

These tips will come from all over the data science spectrum: Machine Learning, Python, data analysis, NLP, Jupyter, and much more!

I hope they will help you to learn something new, work more efficiently, or just motivate and inspire you ✨


👉 Tip #1: Speed up your hyperparameter search

In supervised Machine Learning, “hyperparameter tuning” is the process of tuning your model to make it more effective. For example, if you’re trying to improve your model’s accuracy, you want to find the model parameters that maximize its accuracy score.

One common way to tune your model is through a “grid search”, which basically means that you define a set of parameters you want to try out, and your model evaluation procedure (like cross-validation) checks every combination of those parameters to see which one works the best.

Sounds great, right?

Well, one big problem with grid search is that if your model is slow to train or you have a lot of parameters you want to try, this process can take a LONG TIME.

So what’s the solution? I've got two solutions for you:

1. If you’re using GridSearchCV in scikit-learn, use the “n_jobs” parameter to turn on parallel processing. Set it to -1 to use all processors, though be careful about using that setting in a shared computing environment!

🔗 2-minute demo of parallel processing

2. Also in scikit-learn, swap out RandomizedSearchCV for GridSearchCV. Whereas grid search checks every combination of parameters, “randomized search” checks random combinations of parameters. You specify how many combinations you want to try (based on how much time you have available), and it often finds the “almost best” set of parameters in far less time than grid search!

🔗 5-minute demo of randomized search

How helpful was today’s tip?

🤩🙂😐


If you enjoyed this issue, please forward it to a friend! 📬

See you next Tuesday!

- Kevin

P.S. Shout-out to my long-time pal, Ben Collins, who inspired and encouraged me to start this series. He has been sharing weekly Google Sheets tips for almost 5 years! Check out his site if you want to improve your Sheets skills!

Learn Data Science from Data School 📊

Kevin Markham

Join 25,000+ aspiring Data Scientists and receive Python & Data Science tips every Tuesday!

Read more from Learn Data Science from Data School 📊

Hi Reader, happy Tuesday! My recent tips have been rather lengthy, so I'm going to mix it up with some shorter tips (like today's). Let me know what you think! 💬 🔗 Link of the week A stealth attack came close to compromising the world's computers (The Economist) If you haven't heard about the recent "xz Utils backdoor", it's an absolutely fascinating/terrifying story! In short, a hacker (or team of hackers) spent years gaining the trust of an open-source project by making helpful...

11 days ago • 1 min read

Hi Reader, Today's tip is drawn directly from my upcoming course, Master Machine Learning with scikit-learn. You can read the tip below or watch it as a video! If you're interested in receiving more free lessons from the course (which won't be included in Tuesday Tips), you can join the waitlist by clicking here: Yes, I want more free lessons! 👉 Tip #43: Should you discretize continuous features for Machine Learning? Let's say that you're working on a supervised Machine Learning problem, and...

18 days ago • 2 min read

Hi Reader, I'm so excited to share this week's tip with you! It has been in my head for months, but I finally put it in writing ✍️ It's longer than usual, so if you prefer, you can read it as a blog post instead: Jupyter & IPython terminology explained 🔗 Link of the week Python Problem-Solving Bootcamp (April 1-21) Want to improve your Python skills quickly? There's no better way than solving problems, reviewing alternative solutions, and exchanging ideas with others. That's the idea behind...

about 1 month ago • 3 min read
Share this post