Pdb-hla-docs/Ipi pkg

= Entrez Gene to Uniprot mapping (ipi package) =

Another Neurocommons package, ipi, is used to cache the International Protein Index (IPI) and extract a mapping from Entrez Gene to Uniprot.

Usage
To extract the mapping:

$ python ipiparse.py ipi.HUMAN.dat.gz entrez_uniprot.n3

where ipi.HUMAN.dat.gz is formatted per:


 * IPI - IPI UniProt Format | International Protein Index | EBI (c) European Bioinformatics Institute 2009


 * ipiparse.findparts ( lines )
 * Extract relevant parts from a sequences of IPI formatted lines.{| frame="void" rules="none" frame="void" rules="none"

! Returns:
 * an iterator of (gene, uniprot) pairs of strings
 * }NoteIn the case of an isoform, uniprot includes an #isoform_NNN fragment appendix. This is something of a kludge.


 * ipiparse.gene2prot ( gene, prot )
 * Format Entrez Gene to Uniprot mapping as an N-Triples line. >>> gene2prot('1515', 'O60911')

'<http://purl.org/commons/record/ncbi_gene/1515> <http://bio2rdf.org/ns/bio2rdf#xPath> <http://bio2rdf.org/uniprot:O60911> .' The prot param may include an isoform fragment: >>> gene2prot('6780', 'O95793#isoform_1') '<http://purl.org/commons/record/ncbi_gene/6780> <http://bio2rdf.org/ns/bio2rdf#xPath> <http://bio2rdf.org/uniprot:O95793#isoform_1> .' Notehmm... trying to track down where I got that property URI, I see it should perhaps be http://bio2rdf.org/ns/bio2rdf:xPath</tt>.


 * ipiparse.</tt>isoform_line</tt> ( base, uniprot )
 * Format an N-Triples statement relating a protein to one of its isoforms. >>> isoform_line('O95793', 'O95793#isoform_1')

'<http://bio2rdf.org/uniprot:O95793> <FIXME://example/misc_terms#isoform> <http://bio2rdf.org/uniprot:O95793#isoform_1> .' NoteSee bug 210 regarding FIXME:...#isoform URI


 * ipiparse.</tt>main</tt> ( argv )
 * See Usage above.


 * ipiparse.</tt>parseline</tt> ( ln )
 * Parse a line to its constituent parts.{| frame="void" rules="none" frame="void" rules="none"

! Returns: ('DR', ['UniProtKB/Swiss-Prot', 'O95793-1', 'STAU1_HUMAN', 'M.'])
 * a (type, parts) pair where parts is a list of the semicolon-separate parts of the rest of the line.
 * } >>> parseline('DR  UniProtKB/Swiss-Prot; O95793-1; STAU1_HUMAN; M.')