How to Append, Concat, Join & Merge pandas dataframes

Munia Humaira
5 min readNov 4, 2020

Easiest guideline to append, merge, join & concatenate…!

Photo by Ashkan Forouzani on Unsplash

I don't know about you but when it comes to joining or merging pandas dataframes I used to feel a little nervous. But if you want to be a data scientist, playing with dataframes will be your day-to-day job. And an important role will be played by functions that are used to do this merging activity — append, concat, join and merge. I will try to describe the different approaches using these functions with examples in this article. I will use datasets from this Udemy course that I took a while back. So let's jump onto it.

The basic structures of the methods are as follows —

1. df1.append(df2) — stacks dataframes vertically
2. pd.concat([df1,df2]) — stacks dataframes horizontally or vertically
3. df1.join(df2) — inner, outer, left or right join on indexes
4. pd.merge([df1,df2]) — many join on multiple columns

So here comes the trick you can follow — you don't need to learn all four of them because that will eventually confuse you more. You can learn pd.concat( ) & pd.merge( ) to cover all the bases. Since these two are more effective and used heavily in the production-ready code scripts. Nevertheless, I will show how they work.

Now, imagine we have a restaurant and we keep track of every week who comes to the restaurant and which food they take. Let's load the libraries and the datasets:

It is always a good practice to have a look at the data. Let's see how they look :

Append & Concat :

We see there are two columns containing Customer ID and Food ID. If you want all the weekly sales data to be on one long dataframe, we can simply do append on them like this —

As you can see there are 500 unique indexes because there were 250 rows in each of the dataframe and by assigning the ignore_index and sort parameters True we rearranged the rows. If you want to preserve the original index format you should keep them False.

We can do a similar task using the top-level function pd.concat( ). But with some more useful advantages. Let's see the example first —

Here you can see the difference the axis parameter makes! You can concatenate two or more dataframes vertically or horizontally by specifying axis=0 or 1. Also if you want to keep the index information intact, pass the ignore_index parameter as True.

Now if you want to be a little more playful and want something as a multi-index dataframe, you can use the keys parameter.

So week 1 and week 2 became our outermost index levels. To keep things simple I have added the string week 1 and week 2 as the keys, but you can define different operations to be the keys too. Now if you want to know some sales information from week 1, you can simply use indexing with the tuple of the outer level and index position to grab that info:

Join & Merge:

In this section of the article, I will talk about join and merge operations, which can be used alternately. There are four types of merge operations you can perform on dataframes — inner, outer, left, and right. If you are familiar with SQL then this will be native for you. But if you are not, don't worry I am going to make it seem easier. Let's first see how the .join( ) method works.

Say, we want to have the satisfactory level of the customers who came in week 1 in our imaginary restaurant. Let's see how the satisfactory table looks like:

What .join( ) does is it automatically considers the indices of the two dataframes and joins them based on that. Let's join this dataframe with the week_1 dataframe:

But we can produce this same result with pd.merge( ) function. You can do mainly four types of merging with this function. Firstly, inner join(don't confuse with the .join() method) is the most common type of join one needs almost every day. It returns a dataframe with only those rows that have common characteristics. This is similar to the intersection of two sets.

Now, let’s consider a more generic approach to understand the pd.merge() better -

The above example shows the by default inner merge method which uses the intersection of keys from both frames. If we use left merge method then it will use keys from left frame only. Similarly, if we use right merge method it will use keys from right frame only. And if one wants to use all the keys, the union of keys from both frames, then we have to use the outer merge method. Below are the dataframes for these cases —

So, I have tried to cover the basics of these methods in this article. There are more to read and practice about them which needs another article of their own. However, this one should help someone kick start on their basics. I hope it helps! But it is always best to read more at — Pandas documentation.

--

--

Munia Humaira

Writing mostly about Python, Physics, Optics, and Photonics!