PCAj: Population Structure Prediction System for Japanese | 日本語ページ
EXE (Windows) | JAR (Multi-platform) | Test Data
This application predicts population structure of Japanese
samples using genome-wide SNP genotypes. It creates a 2D
scatterplot of predicted principal components based on the
probabilistic PCA. It also provides posterior probabilities
from which an individual has descended from each of several
Japanese ancestries:
- Hokkaido
- Tohoku
- Kanto-Koshinetsu
- Tokai-Hokuriku
- Kinki
- Kyushu
- Okinawa
based on the result of LDA. The application is based on
SNP markers included in the Illumina HumanHap 550K chip.
However you may use SNP genotype data observed from other
platforms as far as a set of SNPs overlaps with that of the
550K chip. Note that the more SNPs you input, the better
prediction result you get (at least 10,000 SNPs
are recommended).
usage: <genotype_file>
Example: java -jar pca.jar test.txt
The <genotype_file> parameter should be a file that contains
SNP genotype data. The first column of the data is SNP rs_ID
which should be sorted as dictionary order. The second column
is SNP genotypes for all subjects in the sample set without
any separator. The first and second columns should be TAB
separated. Here genotypes should be encoded as 0, 1 and 2
for a homozygote pair of A/T, a heterozygote pair of A/T and C/G
and the other homozygote pair of C/G, respectively. The missing
genotype should be encoded as 9.
Example: AC, AA, CC, TT, AG, GG, TC, NN -> 1, 0, 2, 0, 1, 2, 1, 9
For Windows users, you can drag & drop your data onto the icon of
the execulable file (pca.exe).
If you use this software for publication, please send a note
with the reference or a link. When citing this software, use:
Kumasaka et al. (2010) Establishment of a Standardized System
to Perform Population Structure Analyses with Limited Sample
Size or with Different Sets of SNP Genotypes.
Journal of Human Genetics, 55(8):525-33.
(c) Natsuhiko Kumasaka (kumasaka AT src.riken.jp)
http://kumasakanatsuhiko.jp/projects/popstr/
|