JavaScript seems to be disabled in your browser.
You must have JavaScript enabled in your browser to utilize the functionality of this website.

Remove duplicate rows from Pandas DataFrame where only some columns have the same value

Remove duplicate rows based on two columns:

import pandas as pd

df = pd.DataFrame({'Age': [30, 40, 30, 40, 30, 30, 20, 25],
                   'Height': [120, 162, 120, 120, 120, 72, 120, 81],
                   'Score': [4.6, 4.6, 9.0, 3.3, 4, 8, 9, 3],
                   'State': ['NY', 'NY', 'FL', 'AL', 'NY', 'TX', 'FL', 'AL']},
                  index=['Jane', 'Jane', 'Aaron', 'Penelope', 'Jaane', 'Nicky',
                         'Armour', 'Ponting'])

print("\n -------- Duplicate Rows ----------- \n")
print(df)

df1 = df.reset_index().drop_duplicates(subset=['Age','Height'],
                                       keep='first').set_index('index')

print("\n ------- Unique Rows ------------ \n")
print(df1)

C:\python\pandas>python example54.py
 
 -------- Duplicate Rows -----------
 
          Age  Height  Score State
Jane       30     120    4.6    NY
Jane       40     162    4.6    NY
Aaron      30     120    9.0    FL
Penelope   40     120    3.3    AL
Jaane      30     120    4.0    NY
Nicky      30      72    8.0    TX
Armour     20     120    9.0    FL
Ponting    25      81    3.0    AL
 
 ------- Unique Rows ------------
 
          Age  Height  Score State
index
Jane       30     120    4.6    NY
Jane       40     162    4.6    NY
Penelope   40     120    3.3    AL
Nicky      30      72    8.0    TX
Armour     20     120    9.0    FL
Ponting    25      81    3.0    AL
 
C:\python\pandas>

Creating a Series using List and Dictionary

Create and Print DataFrame

Set Index and Columns of DataFrame

Rename DataFrame Columns

select rows from a DataFrame using operator

Filter DataFrame rows using isin

Example of iterrows and itertuples

Drop DataFrame Column(s) by Name or Index

Add new column to DataFrame

Get list of the column headers

Generate DataFrame with random values

Select multiple columns from DataFrame

Create series using NumPy functions

Get index and values of a series

Specify an Index at Series creation

Get Length Size and Shape of a Series

Example of Heads, Tails and Takes

Slicing a Series into subsets

DataFrame slicing using loc

DataFrame slicing using iloc

loc vs iloc slicing in DataFrame

Reindex DataFrame columns

Determine DataFrame columns data type

Change DataFrame column data type from Int64 to String

Change DataFrame column data-type from UnixTime to DateTime

Alter DataFrame column data type from Float64 to Int32

Alter DataFrame column data type from Object to Datetime64

Convert Dictionary into DataFrame

Appending two DataFrame objects

Add row with specific index name

Append rows using a for loop

Add a row at top

Dynamically Add Rows to DataFrame

Insert a row at an arbitrary position

Adding row to DataFrame with time stamp index

Adding rows with different column names

Example of append, concat and combine_first

Get mean(average) of rows and columns

Calculate sum across rows and columns

Join two columns

Empty DataFrame with Date Index

Filter rows which contain specific keyword

Filtering DataFrame Index

Filtering DataFrame with an AND operator

Find all rows contain a Sub-string

Example of using any()

Example of where()

Count number of rows per group

Get Unique row values

DataFrame is empty

Count Distinct Values

Remove duplicate rows based on two columns

Remove duplicate rows

Get value of a specific cell

Get scalar value of a cell using conditional indexing

Remove duplicate rows

Get list of cell value conditionally

Replace values in column with a dictionary

Count distinct equivalent

Handle missing data

Delete missing data rows

Drop columns with missing data

Sort Index in descending order

Sort Column in descending order

Determine Rank of DataFrame values

Multiple Indexing

Specify Index and Column for DataFrame

Determine Period Index and Column for DataFrame

Determine Period Range with Frequency

Import CSV with specific Index

Writing DataFrame to CSV file

Read specific columns from CSV

Get list of CSV columns

Find row where values for column is maximum

Complex filter data using query method

Check if one or more columns all exist

Locating the n-smallest and n-largest values

Finding minimum and maximum values

Find index position of minimum and maximum values

Calculation of a cumulative product and sum

Summary statistics of DataFrame

Find Mean, Median and Mode

Measure Variance and Standard Deviation

Calculating the percent change at each cell of a DataFrame

Forward and backward filling of missing values

Calculating correlation between two DataFrame

Calculating Co-variance

Stacking using non-hierarchical indexes

Unstacking using hierarchical indexes