Filter duplicated rows in R data.frame -

April 15, 2012

this question has answer here:

remove duplicated rows using dplyr 4 answers

i have data.frame shown below.

> df2 <- data.frame("studentid" = c(1,1,1,2,2,3,3), "subject" = c("maths", "maths", "english","maths", "english", "science", "science"), "score" = c(100,90,80,70, 60,20,10)) > df2   studentid subject score 1         1   maths   100 2         1   maths    90 3         1 english    80 4         2   maths    70 5         2 english    60 6         3 science    20 7         3 science    10

few studentids, have duplicated values column subject (example: id 1 has 2 entries "maths". need keep first 1 of duplicated rows. expected data.frame is:

  studentid subject score 1         1   maths   100 3         1 english    80 4         2   maths    70 5         2 english    60 6         3 science    20

i not able this. ideas.

we can either use unique data.table by option after converting 'data.table' (setdt(df2))

library(data.table) unique(setdt(df2), = c("studentid", "subject")) #   studentid subject score #1:         1   maths   100 #2:         1 english    80 #3:         2   maths    70 #4:         2 english    60 #5:         3 science    20

or distinct 'df2'

library(dplyr) distinct(df2, studentid, subject) #     studentid subject score #       (dbl)  (fctr) (dbl) #1         1   maths   100 #2         1 english    80 #3         2   maths    70 #4         2 english    60 #5         3 science    20

or duplicated base r

df2[!duplicated(df2[1:2]),]

edit: based on suggestions @david arenburg)

Search This Blog

Color

Filter duplicated rows in R data.frame -

Comments

Post a Comment

Popular posts from this blog

java - pagination of xlsx file to XSSFworkbook using apache POI -

Unlimited choices in BASH case statement -

apache - How do I stop my index.php being run twice for every user -