Pdb-hla-docs/README

= PDB/HLA packages for Neurocommons =

This is a collection of Neurcommons packages that marshall data about HLA molecules from the Protein Data Bank (PDB) and related sources.

Contents

 * Protein Databank (pdb package)
 * Selecting PDB structures and caching PDB data
 * keyfetch.py – fetch PDB structure data by keyword
 * Using Jmol to compute contacts between residues
 * Template-based conversion of PDBML to RDF
 * pdbparts.py – Extract from PDB/XML data and convert to RDF via template.
 * streamrdf.py – quick-n-dirty RDF serialization using n-ntriples
 * pdb_naming.py – URI name conventions for the pdb bundle
 * chebi_codes.py – mapping from amino acid short names to CHEBI codes
 * pdb_tpl.py – streaming RDF template for PDB structures
 * contacts_tpl.py – streaming RDF template for PDB contact info
 * IMGT/HLA Alignment data (hla package)
 * alignment_parser.py – parse IMGT/HLA alignment data
 * mkalign.py – capture HLA alignment data as RDF
 * Usage
 * refseq.py – HLA reference sequences
 * allele_names.py – allele name utilities
 * Aligning structures and alleles (pdbsc package)
 * pdbmix.py – mix PDB data with HLA via blast
 * Usage
 * blast_report.py – get best alignment matches
 * pdbmix_tpl.py – template for alignment of PDB with IMGT/HLA
 * ncquery.py – a few queries on the neurocommons knowledge base
 * Usage
 * Entrez Gene to Uniprot mapping (ipi package)
 * ipiparse.py – extract mapping from Entrez Gene to Uniprot from IPI file
 * Usage
 * Residue Numbers in PDB: Design/Testing Notes
 * The PDB coordinate system: 1JWS coordinates from Jmol
 * The Sequential Coordinate System: PDB XML for 1JWS
 * Residue Level Nomenclature Mapping
 * Reporting Sequential Coordinates with Jmol PDB coordinates
 * Insertion Codes
 * @@ISSUE: Impossibly High Residue Numbers
 * Structural features from SIFTS (in progress)
 * SIFTS Modules
 * sifts_parse.py – marshall SCOP, CATH, PFAM, and secondary structure
 * sifts_tpl.py – streaming RDF template for PDB feature info
 * Unit tests
 * SIFTS design/test notes
 * SIFTS data in eFamily format: 1K5N
 * A SCOP feature: 1k5nA01
 * Mapping from PDB coordinates: 1k5nB00
 * Template access to SIFTS data
 * Secondary structure: misc. design notes
 * Pfam and mapRegion

See also


 * Project report: Semantic Web Technologies Applied to Interpretation of HLA Structure Variation
 * Source code: pdb directory and others in the packages directory.
 * bundles in the Neurocommons distribution for data from similar packages
 * ImmPort/PDB in the Neurocommons wiki for earlier design/development notes

Note

Using this pdb package directory for documentation about several other packages (hla, ipi, pdbsc) is a little awkward, but it’s not clear that there’s a cost-effective alternative.

For example, there is another file of testing/debugging stuff for coordinate transformations in ../pdbsc/coords.doctest, but sphinx only groks files under ./, so it’s left out for now.

Todo List
Note

The generated “... original entry ...” stuff is broken in many cases because of an indirection via automodule, which the todolist sphinx plug-in doesn’t get along with.

Todo

Specify results of this template in pdb bundle documentation.

(The original entry is located in pdbsc_pkg.doctest, line 25 and can be found here.)

Colophon
This documentation is mantained as comments within the source code. It’s written using Sphinx markup, which is an enhanced form of reStructuredText markup. This style of markup is fairly readable as plain text in source form and converts to HTML, latex, etc.

The bits that look like interactive python sessions are actually executable tests, using the doctest module.

You can run them a la:

$ python -m doctest -v XYZ

where XYZ is a .doctest or .py file. Or even better, take a look at the doctest mode walkthru and use emacs doctest mode.

To convert this documentation to HTML:

$ make -f doc.mk html

To publish it in the Neurocommons wiki:

$ make publish-doc

or better yet:

$ make WIKILOG="added great stuff and fixed brokenness" publish-doc