Remove rows with duplicate indices in Pandas DataFrame
Remove duplicate rows:
import pandas as pd df = pd.DataFrame({'Age': [30, 30, 22, 40, 20, 30, 20, 25], 'Height': [165, 165, 120, 80, 162, 72, 124, 81], 'Score': [4.6, 4.6, 9.0, 3.3, 4, 8, 9, 3], 'State': ['NY', 'NY', 'FL', 'AL', 'NY', 'TX', 'FL', 'AL']}, index=['Jane', 'Jane', 'Aaron', 'Penelope', 'Jaane', 'Nicky', 'Armour', 'Ponting']) print("\n -------- Duplicate Rows ----------- \n") print(df) df1 = df.reset_index().drop_duplicates(subset='index', keep='first').set_index('index') print("\n ------- Unique Rows ------------ \n") print(df1)
C:\python\pandas>python example52.py -------- Duplicate Rows ----------- Age Height Score State Jane 30 165 4.6 NY Jane 30 165 4.6 NY Aaron 22 120 9.0 FL Penelope 40 80 3.3 AL Jaane 20 162 4.0 NY Nicky 30 72 8.0 TX Armour 20 124 9.0 FL Ponting 25 81 3.0 AL ------- Unique Rows ------------ Age Height Score State index Jane 30 165 4.6 NY Aaron 22 120 9.0 FL Penelope 40 80 3.3 AL Jaane 20 162 4.0 NY Nicky 30 72 8.0 TX Armour 20 124 9.0 FL Ponting 25 81 3.0 AL C:\python\pandas>
2018-11-10T12:39:16+05:30
2018-11-10T12:39:16+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution