Tuesday Tip #40: Build your DataFrame from multiple files ๐Ÿ—๏ธ


Hi Reader,

In case you missed it, I launched a free, 7-hour pandas course!

800+ students have enrolled, and a few have already earned their certificate of completion ๐Ÿ‘ฉโ€๐ŸŽ“


๐Ÿ”— Link of the week

โ€‹Data Internshipsโ€‹

Looking for an internship in Data Science or Analytics? This site curates the latest internship postings and emails them to you each week!


๐Ÿ‘‰ Tip #40: Build a DataFrame from multiple files

Letโ€™s say that your dataset is spread across multiple files, but you want to read the dataset into a single pandas DataFrame.

For example, I have a tiny dataset of stock market data in which each CSV file only includes a single day. Hereโ€™s the first day:

Hereโ€™s the second day:

And hereโ€™s the third day:

You could read each CSV file into its own DataFrame, combine them together, and then delete the original DataFrames, but that would be memory inefficient and require a lot of code.

A better solution is to use Pythonโ€™s built-in glob module:

You can pass a pattern to the glob() function, including wildcard characters, and it will return a list of all files that match that pattern.

In this case, glob() is looking in the โ€œdataโ€ subdirectory for all CSV files that start with the word โ€œstocksโ€ followed by one or more characters:

glob returns filenames in an arbitrary order, which is why we sorted the list using Pythonโ€™s built-in sorted() function.

We can then use a generator expression to read each of the files using read_csv() and pass the results to the concat() function, which will concatenate the rows into a single DataFrame:

Unfortunately, there are now duplicate values in the index. To avoid that, we can tell the concat() function to ignore the index and instead use the default integer index:

Pretty cool, right?

Need to build a DataFrame column-wise instead? Use the same code as above, except pass axis='columns' to concat()!


๐Ÿ‘‹ Until next time

Did you like this weekโ€™s tip? Please forward it to a friend or share this link in your favorite Slack team. It really helps me out! ๐Ÿ™Œ

See you next Tuesday!

- Kevin

P.S. Would you wear pajamas during a Zoom call?โ€‹

Did someone AWESOME forward you this email? Sign up here to receive Data Science tips every week!

Learn Artificial Intelligence from Data School ๐Ÿค–

Join 25,000+ intelligent readers and receive AI tips every Tuesday!

Read more from Learn Artificial Intelligence from Data School ๐Ÿค–

Hi Reader, I just published a new YouTube video: How to use top AI models on a budget Description: Want to chat with the best AI models from OpenAI, Claude, and Google without paying $20/month? I'll show you how to use API keys with TypingMind to access top models for a fraction of the cost, demonstrate its killer feature of chatting with multiple models side-by-side, and explain when paying for a subscription is actually the smarter choice. Timestamps: 0:00 Introduction 0:37 Pay-per-token...

Hi Reader, On Friday, I announced my forthcoming book, Master Machine Learning with scikit-learn. In response, my Dad asked me: How does the subject of this book relate to Artificial Intelligence? In other words: What's the difference between AI and Machine Learning? Ponder that question for a minute, then keep reading to find out how I answered my Dad... ๐Ÿ‘‡ AI vs Machine Learning Here's what I told my Dad: You can think of AI as a field dedicated to creating intelligent systems, and Machine...

Hi Reader, Yesterday, I posted this announcement on LinkedIn and Bluesky and X: Kevin Markham @justmarkham Dream unlocked: I'm publishing my first book! ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ It's called "Master Machine Learning with scikit-learn: A Practical Guide to Building Better Models with Python" Download the first 3 chapters right now: ๐Ÿ‘‰ https://dataschool.kit.com/mlbook ๐Ÿ‘ˆ Thanks for your support ๐Ÿ™ 1:47 PM โ€ข Sep 11, 2025 1 Retweets 5 Likes Read 1 replies This has been a dream of mine for many years, and I'm so excited...