Merging two (or more) pandas DataFrames with different indexes but mostly same columns without losing information?

  dataframe, merge, pandas, python

I want to merge two dataframes, with completely different indexes and mostly same columns without losing any information in the result.

For example, my dataframes have sizes of (32564, 38855) and (32319, 37879). The indexes are completely different, but 37000 of the columns are shared. I need to get a result dataframe with (64883, 39734) size;

but when I try to merge them with:

rp12 = pd.merge(rp1, rp2, left_index=True, right_index=True, how="outer")

The result rp12 has (64883, 76734) size, due to each shared column having doubles with "_x" and "_y" suffixes.

For a more clear explanation, I want to merge two dataframes on the left to one with the shape on the right on this example I tried to make.

How can I merge my two pandas DataFrames with the result DataFrame I need? By the way, the needed result DataFrame will have a size close to (170000, 40000), so I prefer to do this with "merge" or "join" due to performance issues.

Thanks and take care.

Source: Python Questions

LEAVE A COMMENT