bioinformatics - Convert missing values (-9) to NAs in a Plink PED file when reading into R -
i have 2 files: pedigree.ped
, pedigree.map
. these 2 file formats can used plink.
in case want use them r, , think must conversion r format. eg: missing values in plink different missing values in r.
how can convert these 2 files use them in r? how can change missing values na?
sample of data:
ped file:
1 1 0 0 1.02 a g g 0 0 1 2 0 0 0.51 t g c c a 2 3 1 2 -9 0 0 g t t ...
first column id_family, second id_individual, third , fourth father , mother of id_individual, fifth quantitative trait (-9 : missing value), remaining columns genotypes (snps allele). missing value columns 0 except quantitative trait -9.
map file:
1 rs1 0 100000 1 rs2 0 100100 1 rs3 0 100200
first column id chromosome (1-22, x, y or 0 if unplaced), second rs# or snp identifier, third genetic distance (morgans), , fourth base-pair position (bp units)
assuming data in ped file read r data frame -
> my.dataframe v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 1 1 1 0 0 1.02 g g 0 0 2 1 2 0 0 0.51 t g c c 3 2 3 1 2 -9.00 0 0 g t t
now check invalid/missing values per column & assign na. ex, take 5th column -
my.dataframe[my.dataframe[,5] == -9, 5] <- na > my.dataframe v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 1 1 1 0 0 1.02 g g 0 0 2 1 2 0 0 0.51 t g c c 3 2 3 1 2 na 0 0 g t t
similarly assign na required entries.
note: r functions treat nas in special way. respective function arguments. related keywords watch - na.rm
, na.pass
, na.fail
, na.omit
etc.
Comments
Post a Comment