Hi Reader!

Before we get to today's tip, I have three big launch announcements:

1. **My new course, ****Become a Regex Superhero****, launches this Friday!** Watch out for more info and a huge launch discount.

2. **I recently launched ****location-based pricing****.** I'm offering a discount of up to 85% to residents of 160+ countries.

3. **My pal Ben Collins just launched a new course, ****Beginner Apps Script**, which lets you extend the functionality of Google Sheets (and other Google Apps). Ben is the best Sheets teacher I know and is offering 50% off during the launch! *(FYI, this is an affiliate link, which means that I may earn a commission if you sign up using my link.)*

## ๐ Tip #9: Calculate basketball scoring runs

Are you watching March Madness? If so, hit reply and let me know how your bracket is doing ๐

For those who don't know, March Madness is a US college basketball tournament. One term that you'll hear a lot during games is *scoring runs*.

For example, **a team that's on a "12-point scoring run" has scored 12 points without the other team scoring any points.**

So I was wondering: **How could we calculate scoring runs using pandas? ๐ผ**

Let's find out!

## Example scoring data

Let's pretend this was our scoring data. There's one row for each time a team scored points:

In this case, the largest scoring run was when A scored 9 points in a row.

## Identify each scoring run

Now we need to figure out when each scoring run starts!

First, we use the **shift()** method to shift all of the teams down a row, and store those in a column called **previous_team**:

Then, we check if **team** is not equal to **previous_team**, and store the boolean result in a column called **start_of_run**:

Do you see how that works?

By checking whether a given **team** value is equal to the value in the previous row, **we now know when each scoring run starts!**

Finally, we use the **cumsum()** method to assign a **run_id** to each scoring run:

Wait, what just happened?

Any time you do math on a boolean column, **True** gets treated as **1** and **False** gets treated as **0**. Thus by taking the cumulative sum of the **start_of_run** column, the **run_id** increments every time it reaches a **True** value. (Neat, right?)

Shout out to Josh Devlin's excellent blog post, Calculating Streaks in Pandas, for teaching me this exact approach!

## List all scoring runs

Now that each run has been assigned an id, we use a **groupby()** to show the number of points scored by each team during each run:

That's it! Here's the code from today's tip, in case you want to play around with it.

**How else could we analyze this data using pandas?** ๐ค

Have an idea? Hit reply and let me know! ๐ก

**If you enjoyed this week's tip, please forward it to a friend!** Takes only a few seconds, and it really helps me out ๐

See you next Tuesday!

- Kevin

**P.S.** Six people predicted the Final Four correctlyโ

*Did someone awesome forward you this email? **Sign up here to receive data science tips every week**!*