01 February 2010

Searching for Genotypes with SPARQL.

This week-end, I've noticed that the NCBI has an interface called Genotype Query Form used to query some genotypes the generating the following kind of XML output:
<GenoExchange xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.ncbi.nlm
.nih.gov/SNP/geno" xsi:schemaLocation="http://www.ncbi.nlm.nih.gov/SNP/geno ftp://ftp.ncbi.nlm.nih.gov/snp/specs/genoex_1_4.xsd" dbSNPBuildNo="129">
<Population popId="1409" handle="CSHL-HAPMAP" locPopId="HapMap-CEU">
<popClass self="NOT SPECIFIED" />
</Population>
<Individual indId="170" taxId="9606" sex="F" indGroup="European">
<SourceInfo source="Coriell" sourceType="repository" ncbiPedId="80" pedId="1340" indId="NA07000" maId="0" paId="0" srcIndGroup="Western and Nothern European" />
<SubmitInfo popId="1409" submittedIndId="NA07000" subIndGroup="Western and Northern European" />
</Individual>
<Individual indId="621" taxId="9606" sex="F" indGroup="European">

(...)
<SnpLoc genomicAssembly="36:reference" chrom="1" start="1286927" locType="2" rsOrientToCh
rom="rev" contigAllele="C" />
<SsInfo ssId="3906671" locSnpId="AL139287.6_22772" ssOrientToRs="fwd">
<ByPop popId="1409" sampleSize="120">
<AlleleFreq allele="A" freq="0.117" />
<AlleleFreq allele="G" freq="0.883" />
<GTypeFreq gtype="A/G" freq="0.233" />
<GTypeFreq gtype="G/G" freq="0.767" />
(...)
<GTypeByInd indId="636" gtype="G/G" />
<GTypeByInd indId="456" gtype="G/G" />
<GTypeByInd indId="536" gtype="G/G" />
</ByPop>
</SsInfo>
<GTypeFreq gtype="A/A" freq="0.380952380952381" />
<GTypeFreq gtype="A/G" freq="0.352380952380952" />
<GTypeFreq gtype="G/G" freq="0.266666666666667" />
</SnpInfo>
<SnpInfo rsId="2765021" observed="A/G">
(...)
I wanted to see how one could query this kind of data with SPARQL... well, I'm sure that RDF is one of the most inefficient way to store this kind of data but I wanted to see what could be extracted from such RDFStore from a semantic query. First, I wrote a XSLT stylesheet transforming <GenoExchange/> to <rdf:RDF/>. The stylsheet is available at http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/genoexch2rdf.xsl.
.

Transform the data

About 639 HAPMAP snps on the chromosome 1 were extracted using the HTML form and saved as XML to the file 'SNPgenotype-100201-1244-3905.xml'(size 4Mo). The xml was converted to RDF with the xsltproc engine:
xsltproc --stringparam "with-sequence" yes --novalid genoexch2rdf.xsl SNPgenotype-100201-1244-3905.xml > input.rdf
The size of 'input.rdf' (including the flanking sequences of the SNPs) was 20Mo.

Result


<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:g="http://www.ncbi.nlm.nih.gov/SNP/geno" xmlns:snp="http://www.ncbi.nlm.nih.gov/SNP/docsum" xmlns="http://ontology.lindenb.org/genotypes/">
<Population rdf:about="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&amp;pop_id=1409">
<handle>CSHL-HAPMAP</handle>
<locPopId>HapMap-CEU</locPopId>
</Population>
<Individual rdf:about="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=170">
<hasPop rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&amp;pop_id=1409"/>
<sex>F</sex>
<name>NA07000</name>
</Individual>
<Individual rdf:about="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=621">
<hasPop rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&amp;pop_id=1409"/>
<sex>F</sex>
<name>NA12875</name>
</Individual>
<Individual rdf:about="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=538">
<hasPop rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&amp;pop_id=1409"/>
<sex>F</sex>
<name>NA12753</name>
(...)
<SNP rdf:about="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=307347">
<het rdf:datatype="http://www.w3.org/2001/XMLSchema#float">0.1</het>
<name>rs307347</name>
<seq5>GGGGATGGCTGCTCCTGGGCCTCAGAAAGATGCAGTCCCATAGACTTCCAGCACGCCCCTCCCCTCCTCGGGCCTTAATTTTGTCCACTGAGAAGATGGTCTCTGAGGCTCTGGGGTTTCCTTCTTGGTCACCAGATATTCTGCGGGCCTTGCCTTCCTGCCCAGATTCGAGCCAGTGGCAAACAGAAGCTGCCAGGAGC</seq5>
<observed>C/T</observed>
<seq3>TCTCAGAGCTGTGGCTGGTGGCTCGGTAACAACAGGAAGGGCAGTGGCTGTGCAGGAGGCAGGCAGCTTGCCAGCCCAGGAAGGTGACCCAGGACACCTCCAGGCCTTTCCCAGGGCAGCCCAACGGCCCAAGGTCAGGGCCGGGCGCGAGGGCGGCCTGAGCACAGAGCACGGGGGCTGACAGCAGGCTGGGGGGCCAG</seq3>
</SNP>
<MapLoc>
<hasSNP rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=307347"/>
<strand>+</strand>
<chrom>1</chrom>
<start rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1320381</start>
<assembly rdf:resource="urn:assembly:Celera:36_3"/>
<type>exact</type>
</MapLoc>
(...)
<Genotype>
<hasIndi rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=465"/>
<hasSNP rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=940550"/>
<allele1>T</allele1>
<allele2>T</allele2>
</Genotype>
<Genotype>
<hasIndi rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=253"/>
<hasSNP rdf:resource="http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=940550"/>
<allele1>T</allele1>
<allele2>T</allele2>
</Genotype>
</rdf:RDF>

Invoking ARQ

export ARQROOT=ARQ-2.5.0
ARQ-2.5.0/bin/arq --data ~/input.rdf --query ~/query01.rq

Dump All



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
SELECT ?s ?p ?o {?s ?p ?o.}

Result

| _:b0 | g:allele2 | "C" |
| _:b0 | g:allele1 | "C" |
| _:b0 | g:hasSNP | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=17160669> |
| _:b0 | g:hasIndi | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=636> |
| _:b0 | rdf:type | g:Genotype |
| _:b1 | g:allele2 | "T" |
| _:b1 | g:allele1 | "C" |
| _:b1 | g:hasSNP | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=17160669> |
| _:b1 | g:hasIndi | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=361> |
| _:b1 | rdf:type | g:Genotype |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=174> | g:name | "NA07048" |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=174> | g:sex | "M" |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=174> | g:hasPop | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&pop_id=1409> |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=174> | rdf:type | g:Individual |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=566> | g:name | "NA12802" |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=566> | g:sex | "F" |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=566> | g:hasPop | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?type=pop&pop_id=1409> |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=566> | rdf:type | g:Individual |
| _:b2 | g:allele2 | "A" |
| _:b2 | g:allele1 | "A" |
| _:b2 | g:hasSNP | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=2765021> |
| _:b2 | g:hasIndi | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=429> |
| _:b2 | rdf:type | g:Genotype |
| _:b3 | g:allele2 | "C" |
| _:b3 | g:allele1 | "C" |
| _:b3 | g:hasSNP | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=17160669> |
| _:b3 | g:hasIndi | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=546> |
| _:b3 | rdf:type | g:Genotype |
| _:b4 | g:allele2 | "T" |
| _:b4 | g:allele1 | "C" |
| _:b4 | g:hasSNP | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=17160669> |
| _:b4 | g:hasIndi | <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=159> |
| _:b4 | rdf:type | g:Genotype |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=621> | g:name | "NA12875" |
| <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ind.cgi?ind_id=621> | g:sex | "F" |
(...)


Select the populations



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
SELECT ?pop
{
?s a g:Population .
?s g:handle ?pop .
}

Result

-----------------
| pop |
=================
| "CSHL-HAPMAP" |
-----------------


List six individuals for each population



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
SELECT ?pop ?indi_name ?good
{
?s a g:Population .
?s g:handle ?pop .
?s2 a g:Individual .
?s2 g:hasPop ?s .
?s2 g:sex ?good .
?s2 g:name ?indi_name
}
limit 6

Result

------------------------------------
| pop | indi_name | good |
====================================
| "CSHL-HAPMAP" | "NA10854" | "F" |
| "CSHL-HAPMAP" | "NA12264" | "M" |
| "CSHL-HAPMAP" | "NA11993" | "F" |
| "CSHL-HAPMAP" | "NA10830" | "M" |
| "CSHL-HAPMAP" | "NA12762" | "M" |
| "CSHL-HAPMAP" | "NA12155" | "M" |
------------------------------------


List the SNPs having a flanking sequence containing 'CACACA'



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>

SELECT ?name ?seq5 ?observed ?seq3
WHERE
{
?s a g:SNP .
?s g:name ?name .
?s g:seq5 ?seq5 .
?s g:seq3 ?seq3 .
?s g:observed ?observed .

FILTER (
fn:contains(fn:upper-case(?seq5), "CACACA") ||
fn:contains(fn:upper-case(?seq3), "CACACA")
)
}

Result

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| name | seq5 | observed | seq3 |
=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================
| "rs17160669" | "GCCACCGCGCCTGGCCCACAAGCATAACTTTTATAAAAATAATTTACTTTTACAATTAAGCTTAGGAATCACACAGACTCAGGGCTGGCTCATGGCTTCC" | "C/T" | "GGCAAGTTAAACTCTGTACTTAGGCTCGGCGCGTATGAAATGGCTAATTCTAATCAGTGGTGCAATGAAGTAACTCCTCTAAAGAACTTATCGGGCCGGG" |
| "rs2765023" | "ACTTGTAAATTTAGTCAGCATACATAACTAACCAAAACTTCAATATATCTTGAGACCCCCTTGGGGGGCTGTCTCCATAAAAGTGACTTTCCCAGGAGAGTGACTGGATGTGATTGGCCAACACCGTCTTAGCCCGCAGGGGTTCCTGGCGCGGAAGCCTCACGTCCCTCCCCACAGCGAGTTTTCAGAATCCAAAGGCCGTAGGAGAAAGAAGGCTGGCGGTGTTTCCTCTTAGAGGGGAGAAACTCAGCCTGGGTAGGAGACCCAGCCCCACGCAGGGAAAACTGTGCTAACGCTTCC" | "A/G" | "ATGTGCGTGGCAGGTGCGGCGGCGGCGAATACGGTTTGTCCTCGAGCCTAACCCTGTCTGTGTTGGTGTCAGCAGTGGCCCCCCTACCACACACACAGGGTCCCTGGCGTCCCAAGACCACTCCTGGCAGCCCCGCCACTGGCTGCGCCTGGAAGCCGCGTCCTCAGGCCTCGCCTGGCATTTGCTGTCACAGAGGTTGCTTCCTTGGGTCCGTCCGTCCTCGCCCCTCCAGCCTGGGCGCCCCCCCACCCCTGTCTCATTCCCTCCACCACATGCAGCACAGTCCAGGAGGCTGGGGTC" |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Get 12 Heterozygous Genotypes



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>

SELECT ?indi ?snp ?a1 ?a2
WHERE
{
?s a g:Genotype .
?s g:allele1 ?a1 .
?s g:allele2 ?a2 .
?s g:hasIndi ?s2 .
?s2 g:name ?indi .
?s g:hasSNP ?s3 .
?s3 g:name ?snp .
FILTER ( ?a1 != ?a2 )
}
LIMIT 10

Result

----------------------------------------
| indi | snp | a1 | a2 |
========================================
| "NA12056" | "rs17160669" | "C" | "T" |
| "NA12716" | "rs17160669" | "C" | "T" |
| "NA12761" | "rs17160669" | "C" | "T" |
| "NA10839" | "rs2765023" | "A" | "G" |
| "NA12813" | "rs2765023" | "A" | "G" |
| "NA12760" | "rs2765023" | "A" | "G" |
| "NA12865" | "rs17160669" | "C" | "T" |
| "NA07056" | "rs17160669" | "C" | "T" |
| "NA12146" | "rs2765023" | "A" | "G" |
| "NA10860" | "rs2765023" | "A" | "G" |
| "NA10839" | "rs17160669" | "C" | "T" |
| "NA12812" | "rs17160669" | "C" | "T" |
----------------------------------------


List 12 SNPs on chr1 between 100000 and 500000 on the reference assembly, order by chrom/position



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>

SELECT ?snp ?chrom ?orient ?start
WHERE
{
?s a g:SNP .
?s g:name ?snp .
?s2 a g:MapLoc .
?s2 g:hasSNP ?s .
?s2 g:chrom ?chrom .
?s2 g:chrom "1" .
?s2 g:strand ?orient .
?s2 g:start ?start .
?s2 g:assembly <urn:assembly:reference:36_3> .
FILTER ( ?start > 100000 && ?start< 500000)
}
ORDER BY ?chrom ?start
LIMIT 12

Result

------------------------------------------
| snp | chrom | orient | start |
==========================================
| "rs17009015" | "1" | "-" | 121810 |
| "rs11490937" | "1" | "+" | 222076 |
| "rs12041624" | "1" | "+" | 232164 |
| "rs11514575" | "1" | "-" | 235726 |
| "rs4731490" | "1" | "+" | 311783 |
| "rs4006867" | "1" | "+" | 325493 |
| "rs7462951" | "1" | "-" | 360984 |
| "rs4030300" | "1" | "+" | 392471 |
| "rs4030303" | "1" | "+" | 392552 |
| "rs9661032" | "1" | "-" | 396549 |
| "rs3872250" | "1" | "-" | 400742 |
| "rs3907361" | "1" | "-" | 412985 |
------------------------------------------


List the positions of 10 SNPs on the reference assembly and chr1, print the heterozygosity if it exists and is greater than 0.1



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>

SELECT ?snp ?chrom ?orient ?start ?het
WHERE
{
?s a g:SNP .
?s g:name ?snp .
?s2 a g:MapLoc .
?s2 g:hasSNP ?s .
?s2 g:chrom ?chrom .
?s2 g:chrom "1" .
?s2 g:strand ?orient .
?s2 g:start ?start .
?s2 g:assembly <urn:assembly:reference:36_3> .
OPTIONAL { ?s g:het ?het . FILTER ( ?het > 0.1 ) }
}
LIMIT 10

Result

------------------------------------------------------------------------------------------------
| snp | chrom | orient | start | het |
================================================================================================
| "rs7417504" | "1" | "+" | 555799 | |
| "rs10018120" | "1" | "-" | 241387750 | "0.48"^^<http://www.w3.org/2001/XMLSchema#float> |
| "rs12043546" | "1" | "+" | 224043895 | |
| "rs4023296" | "1" | "-" | 141776514 | |
| "rs1320571" | "1" | "+" | 1110293 | "0.31"^^<http://www.w3.org/2001/XMLSchema#float> |
| "rs1359759" | "1" | "+" | 115826181 | "0.49"^^<http://www.w3.org/2001/XMLSchema#float> |
| "rs7553429" | "1" | "+" | 1080419 | "0.19"^^<http://www.w3.org/2001/XMLSchema#float> |
| "rs4245756" | "1" | "+" | 789325 | |
| "rs3766177" | "1" | "-" | 1471210 | "0.5"^^<http://www.w3.org/2001/XMLSchema#float> |
| "rs9442372" | "1" | "+" | 1008566 | "0.46"^^<http://www.w3.org/2001/XMLSchema#float> |
------------------------------------------------------------------------------------------------


Print 10 differences between the Reference Assembly and the Celera Assembly



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>

SELECT ?snp ?chrom1 ?orient1 ?start1 ?chrom2 ?orient2 ?start2
WHERE
{
?s a g:SNP .
?s g:name ?snp .

?s2 a g:MapLoc .
?s2 g:hasSNP ?s .
?s2 g:chrom ?chrom1 .
?s2 g:strand ?orient1 .
?s2 g:start ?start1 .
?s2 g:assembly <urn:assembly:Celera:36_3> .

?s3 a g:MapLoc .
?s3 g:hasSNP ?s .
?s3 g:chrom ?chrom2 .
?s3 g:strand ?orient2 .
?s3 g:start ?start2 .
?s3 g:assembly <urn:assembly:reference:36_3> . .

}
LIMIT 10

Result

-----------------------------------------------------------------------------
| snp | chrom1 | orient1 | start1 | chrom2 | orient2 | start2 |
=============================================================================
| "rs7553640" | "1" | "-" | 833104 | "1" | "+" | 1751873 |
| "rs3951936" | "9" | "-" | 41330304 | "4" | "+" | 49186295 |
| "rs3951936" | "9" | "-" | 41330304 | "1" | "-" | 142233119 |
| "rs3951936" | "9" | "-" | 41330304 | "1" | "+" | 142038296 |
| "rs3951936" | "9" | "-" | 41330304 | "1" | "-" | 141781399 |
| "rs3951936" | "9" | "-" | 41330304 | "1" | "+" | 141641811 |
| "rs41319344" | "Y" | "+" | 10690990 | "Y" | "-" | 25853159 |
| "rs41319344" | "Y" | "+" | 10690990 | "Y" | "+" | 24928047 |
| "rs41319344" | "Y" | "+" | 10690990 | "1" | "-" | 241194834 |
| "rs10907183" | "1" | "-" | 1511375 | "1" | "+" | 1060980 |
-----------------------------------------------------------------------------


Create a new RDF graph of 10 SNPs having a neighbour at a distance less than 500pb



Query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX g: <http://ontology.lindenb.org/genotypes/>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>

CONSTRUCT { ?snp1 g:hasNeighbour ?snp2 . }
WHERE
{
?snp1 a g:SNP .
?snp2 a g:SNP .

?s1 a g:MapLoc .
?s1 g:hasSNP ?snp1 .
?s1 g:chrom ?chrom1 .
?s1 g:strand ?orient1 .
?s1 g:start ?start1 .
?s1 g:assembly <urn:assembly:reference:36_3> .

?s2 a g:MapLoc .
?s2 g:hasSNP ?snp2 .
?s2 g:chrom ?chrom2 .
?s2 g:strand ?orient2 .
?s2 g:start ?start2 .
?s2 g:assembly <urn:assembly:reference:36_3> .

FILTER( (fn:abs(?start1 - ?start2) < 500) && ?chrom1=?chrom2 && ?snp1!=?snp2)

}
LIMIT 10

Result

@prefix : <http://ontology.lindenb.org/genotypes/> .
@prefix g: <http://ontology.lindenb.org/genotypes/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fn: <http://www.w3.org/2005/xpath-functions#> .

<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=7545812>
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=9970455> .

<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1043506>
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=12126411> .

<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6603793>
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=7548693> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=7553066> .

<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=10907178>
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=10907177> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=11260588> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=11260587> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6701114> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=3737728> ;
:hasNeighbour <http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=9442398> .



That's it !
Pierre

2 comments:

Anonymous said...

What RDF store are you using right know? TDB?

I suggest you retry your SPARQL queries after putting the RDF in a Virtuoso instance... see my blog for practical tips on setting it up, though I have yet to work out proper indexing. For that, see this page:

http://docs.openlinksw.com/virtuoso/rdfperformancetuning.html#rdfperfgeneraldbpedia

Pierre Lindenbaum said...

Hi Egon ! :-)

I'm just running my queries on a flat RDF file. Again, I just wanted to play again with sparql (before biohackathon2010 that will be focused on the semantic web).
I guess I'll play with Virtuoso next week in Japan, but again, I cannot believe that a RDF store can be used to store a large amount of genotypes.