profile

Learn Data Science from Data School 📊

Tuesday Tip #11: Power up your pandas DataFrame 🐼

Published about 1 year ago • 2 min read

Hi Reader!

Back in tip #5, I showed you how to visualize your pandas code with Pandas Tutor.

Today, I’ve got four exciting pandas tools that can help you to:

  1. Speed up your data exploration
  2. Explore your dataset visually
  3. Treat your DataFrame like a spreadsheet
  4. Write pandas code faster with the help of AI

Let’s go! 🚀


👉 Tip #11: 4 tools to improve your pandas workflow

There are tons of free tools designed to improve your pandas workflow, but which ones are worth trying out?

I only considered tools that are being actively developed and maintained, since it’s not worth investing your time into a tool that will quickly become outdated, buggy, or broken.

Here are my top four picks...

1️⃣ ydata-profiling: “One-line Exploratory Data Analysis”

  • Summary: You run one line of code, and it creates an interactive report that makes it easy to examine each variable in your DataFrame. It also visualizes the interactions between variables, and alerts you to possible problems with the dataset. The report can even be exported to HTML!
  • Example: HTML report
  • Installation: pip or conda
  • Notes: It used to be known as pandas-profiling, but was renamed since it now also supports Spark DataFrames.
  • Takeaway: It’s a huge time-saver for getting an overview of a new dataset.

2️⃣ PyGWalker: “Turn your pandas DataFrame into a Tableau-style User Interface”

  • Summary: You run one line of code, and it creates a Tableau-like interface for visually exploring your pandas (or Polars) DataFrame. It works within Jupyter, Google Colab, Kaggle Code, VS Code, Streamlit, and more.
  • Example: Kaggle notebook
  • Installation: pip or conda
  • Notes: According to the repository, PyGWalker is pronounced “Pig Walker.”
  • Takeaway: It looks useful if you’re already familiar with Tableau (which I am not!)

3️⃣ Mito: “Edit spreadsheet, generate Python code”

  • Summary: It’s essentially spreadsheet software that you run inside of Jupyter. The killer feature is that as you point-and-click (or write Excel-style formulas) to transform your data, Mito writes the corresponding pandas code for you. You can even create interactive, customizable graphs!
  • Example: Watch the demo video
  • Installation: pip (virtual environment recommended)
  • Notes: Most features are available for free, though a few features are limited to paid plans.
  • Takeaway: It’s designed to help you automate your spreadsheet workflow, though you could also use it to help you learn pandas!

4️⃣ Sketch: “AI code-writing assistant for pandas”

  • Summary: You write out what you want to do with a DataFrame, and Sketch writes the pandas code for you! You can also ask it questions about your dataset.
  • Example: Colab notebook or watch the demo video
  • Installation: pip
  • Notes: Sketch shares information about your DataFrame with OpenAI, which improves the relevance of its suggestions.
  • Takeaway: It could help you to speed up your pandas workflow, though it’s important that you double-check the code suggestions (since they are not guaranteed to be correct).

What did I miss? Reply and let me know your favorite pandas tool!


If you enjoyed this week’s tip, please forward it to a friend! Takes only a few seconds, and it really helps me out 🙏

See you next Tuesday!

- Kevin

P.S. Reality TV

Did someone awesome forward you this email? Sign up here to receive data science tips every week!

Learn Data Science from Data School 📊

Kevin Markham

Join 25,000+ aspiring Data Scientists and receive Python & Data Science tips every Tuesday!

Read more from Learn Data Science from Data School 📊

Hi Reader, Last week, I recorded the FINAL 28 LESSONS 🎉 for my upcoming course, Master Machine Learning with scikit-learn. That's why you didn't hear from me last week! 😅 I edited one of those 28 videos and posted it on YouTube. That video is today's tip, which I'll tell you about below! 👉 Tip #45: How to read the scikit-learn documentation In order to become truly proficient with scikit-learn, you need to be able to read the documentation. In this video lesson, I’ll walk you through the five...

3 days ago • 1 min read

Hi Reader, happy Tuesday! My recent tips have been rather lengthy, so I'm going to mix it up with some shorter tips (like today's). Let me know what you think! 💬 🔗 Link of the week A stealth attack came close to compromising the world's computers (The Economist) If you haven't heard about the recent "xz Utils backdoor", it's an absolutely fascinating/terrifying story! In short, a hacker (or team of hackers) spent years gaining the trust of an open-source project by making helpful...

17 days ago • 1 min read

Hi Reader, Today's tip is drawn directly from my upcoming course, Master Machine Learning with scikit-learn. You can read the tip below or watch it as a video! If you're interested in receiving more free lessons from the course (which won't be included in Tuesday Tips), you can join the waitlist by clicking here: Yes, I want more free lessons! 👉 Tip #43: Should you discretize continuous features for Machine Learning? Let's say that you're working on a supervised Machine Learning problem, and...

24 days ago • 2 min read
Share this post