Learn Data Science from Data School 📊

Tuesday Tip #11: Power up your pandas DataFrame 🐼

Published 11 months ago • 2 min read

Hi Reader!

Back in tip #5, I showed you how to visualize your pandas code with Pandas Tutor.

Today, I’ve got four exciting pandas tools that can help you to:

  1. Speed up your data exploration
  2. Explore your dataset visually
  3. Treat your DataFrame like a spreadsheet
  4. Write pandas code faster with the help of AI

Let’s go! 🚀

👉 Tip #11: 4 tools to improve your pandas workflow

There are tons of free tools designed to improve your pandas workflow, but which ones are worth trying out?

I only considered tools that are being actively developed and maintained, since it’s not worth investing your time into a tool that will quickly become outdated, buggy, or broken.

Here are my top four picks...

1️⃣ ydata-profiling: “One-line Exploratory Data Analysis”

  • Summary: You run one line of code, and it creates an interactive report that makes it easy to examine each variable in your DataFrame. It also visualizes the interactions between variables, and alerts you to possible problems with the dataset. The report can even be exported to HTML!
  • Example: HTML report
  • Installation: pip or conda
  • Notes: It used to be known as pandas-profiling, but was renamed since it now also supports Spark DataFrames.
  • Takeaway: It’s a huge time-saver for getting an overview of a new dataset.

2️⃣ PyGWalker: “Turn your pandas DataFrame into a Tableau-style User Interface”

  • Summary: You run one line of code, and it creates a Tableau-like interface for visually exploring your pandas (or Polars) DataFrame. It works within Jupyter, Google Colab, Kaggle Code, VS Code, Streamlit, and more.
  • Example: Kaggle notebook
  • Installation: pip or conda
  • Notes: According to the repository, PyGWalker is pronounced “Pig Walker.”
  • Takeaway: It looks useful if you’re already familiar with Tableau (which I am not!)

3️⃣ Mito: “Edit spreadsheet, generate Python code”

  • Summary: It’s essentially spreadsheet software that you run inside of Jupyter. The killer feature is that as you point-and-click (or write Excel-style formulas) to transform your data, Mito writes the corresponding pandas code for you. You can even create interactive, customizable graphs!
  • Example: Watch the demo video
  • Installation: pip (virtual environment recommended)
  • Notes: Most features are available for free, though a few features are limited to paid plans.
  • Takeaway: It’s designed to help you automate your spreadsheet workflow, though you could also use it to help you learn pandas!

4️⃣ Sketch: “AI code-writing assistant for pandas”

  • Summary: You write out what you want to do with a DataFrame, and Sketch writes the pandas code for you! You can also ask it questions about your dataset.
  • Example: Colab notebook or watch the demo video
  • Installation: pip
  • Notes: Sketch shares information about your DataFrame with OpenAI, which improves the relevance of its suggestions.
  • Takeaway: It could help you to speed up your pandas workflow, though it’s important that you double-check the code suggestions (since they are not guaranteed to be correct).

What did I miss? Reply and let me know your favorite pandas tool!

If you enjoyed this week’s tip, please forward it to a friend! Takes only a few seconds, and it really helps me out 🙏

See you next Tuesday!

- Kevin

P.S. Reality TV

Did someone awesome forward you this email? Sign up here to receive data science tips every week!

Learn Data Science from Data School 📊

Kevin Markham

Join 25,000+ aspiring Data Scientists and receive Python & Data Science tips every Tuesday!

Read more from Learn Data Science from Data School 📊

Hi Reader, Do any of these sound like you? You're new to the pandas library and you want to learn the fundamentals You have some experience with pandas, but you want to fill in the gaps in your knowledge You want to learn the best practices for data analysis with pandas in 2024 If so, you should enroll in my FREE course (launching today!), pandas in 30 days. Why learn pandas? pandas is a powerful, open source Python library for data analysis, manipulation, and visualization. If you're working...

9 days ago • 1 min read

Hi Reader, There's a gift for you somewhere in this email... just look for the 🎁 emoji! Tip #39: Six quick Python tricks Here's what I'll cover below: Return the number of unique values Count values with Counter Better debugging with f-strings Return multiple values from a function Count while looping Create a dictionary with a comprehension Let's get started! 👇 1️⃣ Return the number of unique values Need to know the number of unique values in an iterable? Convert it to a set and check the...

16 days ago • 2 min read

Hi Reader, My goal with Tuesday Tips is to help you get better at Data Science every week. Is there anything that would make these tips even more helpful for you? Let me know! 💬 You can find past tips at (Yes, that’s a real URL!) Tip #38: Five ways to rename your DataFrame columns Let's say that we have a simple pandas DataFrame: I prefer to use dot notation to select pandas columns, but that won't work since the column names have spaces. Let's fix this! The most flexible method...

23 days ago • 1 min read
Share this post