Filter duplicated rows in R data.frame -
this question has answer here:
- remove duplicated rows using dplyr 4 answers
i have data.frame shown below.
> df2 <- data.frame("studentid" = c(1,1,1,2,2,3,3), "subject" = c("maths", "maths", "english","maths", "english", "science", "science"), "score" = c(100,90,80,70, 60,20,10)) > df2 studentid subject score 1 1 maths 100 2 1 maths 90 3 1 english 80 4 2 maths 70 5 2 english 60 6 3 science 20 7 3 science 10 few studentids, have duplicated values column subject (example: id 1 has 2 entries "maths". need keep first 1 of duplicated rows. expected data.frame is:
studentid subject score 1 1 maths 100 3 1 english 80 4 2 maths 70 5 2 english 60 6 3 science 20 i not able this. ideas.
we can either use unique data.table by option after converting 'data.table' (setdt(df2))
library(data.table) unique(setdt(df2), = c("studentid", "subject")) # studentid subject score #1: 1 maths 100 #2: 1 english 80 #3: 2 maths 70 #4: 2 english 60 #5: 3 science 20 or distinct 'df2'
library(dplyr) distinct(df2, studentid, subject) # studentid subject score # (dbl) (fctr) (dbl) #1 1 maths 100 #2 1 english 80 #3 2 maths 70 #4 2 english 60 #5 3 science 20 or duplicated base r
df2[!duplicated(df2[1:2]),] edit: based on suggestions @david arenburg)
Comments
Post a Comment