Remove duplicate rows from Pandas DataFrame where only some columns have the same value
Remove duplicate rows based on two columns:
import pandas as pd df = pd.DataFrame({'Age': [30, 40, 30, 40, 30, 30, 20, 25], 'Height': [120, 162, 120, 120, 120, 72, 120, 81], 'Score': [4.6, 4.6, 9.0, 3.3, 4, 8, 9, 3], 'State': ['NY', 'NY', 'FL', 'AL', 'NY', 'TX', 'FL', 'AL']}, index=['Jane', 'Jane', 'Aaron', 'Penelope', 'Jaane', 'Nicky', 'Armour', 'Ponting']) print("\n -------- Duplicate Rows ----------- \n") print(df) df1 = df.reset_index().drop_duplicates(subset=['Age','Height'], keep='first').set_index('index') print("\n ------- Unique Rows ------------ \n") print(df1)
C:\python\pandas>python example54.py -------- Duplicate Rows ----------- Age Height Score State Jane 30 120 4.6 NY Jane 40 162 4.6 NY Aaron 30 120 9.0 FL Penelope 40 120 3.3 AL Jaane 30 120 4.0 NY Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0 AL ------- Unique Rows ------------ Age Height Score State index Jane 30 120 4.6 NY Jane 40 162 4.6 NY Penelope 40 120 3.3 AL Nicky 30 72 8.0 TX Armour 20 120 9.0 FL Ponting 25 81 3.0 AL C:\python\pandas>
2018-10-28T05:09:14+05:30
2018-10-28T05:09:14+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution