Python ProgrammingPython Programming

Remove rows with duplicate indices in Pandas DataFrame

Remove duplicate rows:

import pandas as pd

df = pd.DataFrame({'Age': [30, 30, 22, 40, 20, 30, 20, 25],
                   'Height': [165, 165, 120, 80, 162, 72, 124, 81],
                   'Score': [4.6, 4.6, 9.0, 3.3, 4, 8, 9, 3],
                   'State': ['NY', 'NY', 'FL', 'AL', 'NY', 'TX', 'FL', 'AL']},
                  index=['Jane', 'Jane', 'Aaron', 'Penelope', 'Jaane', 'Nicky',
                         'Armour', 'Ponting'])

print("\n -------- Duplicate Rows ----------- \n")
print(df)

df1 = df.reset_index().drop_duplicates(subset='index',
                                       keep='first').set_index('index')

print("\n ------- Unique Rows ------------ \n")
print(df1)



C:\python\pandas>python example52.py
 
 -------- Duplicate Rows -----------
 
          Age  Height  Score State
Jane       30     165    4.6    NY
Jane       30     165    4.6    NY
Aaron      22     120    9.0    FL
Penelope   40      80    3.3    AL
Jaane      20     162    4.0    NY
Nicky      30      72    8.0    TX
Armour     20     124    9.0    FL
Ponting    25      81    3.0    AL
 
 ------- Unique Rows ------------
 
          Age  Height  Score State
index
Jane       30     165    4.6    NY
Aaron      22     120    9.0    FL
Penelope   40      80    3.3    AL
Jaane      20     162    4.0    NY
Nicky      30      72    8.0    TX
Armour     20     124    9.0    FL
Ponting    25      81    3.0    AL
 
C:\python\pandas>