Semantic resources project/Meeting notes/2010-01-21

= Attendance = KT, EW, TD, TC, JAR, AR, DN (phone), PC

= Agenda =

PRO
 * talk to Darren
 * APP representation
 * description of V8: Uniprot import
 * version of OBO
 * isoforms and cleavage products: how are they handled?

= Notes =

PRO Talk with DN

 * create species specific proteins
 * uniprot could represent all gene products from specific locus, but not particular form or mutant or sequence
 * will work with uniprot to see if PRO will allocate IDs for any protein classneeded for APP need human specific and some cross-species cleavage products and proteins
 * AR feedback to DN:
 * have seen from PRO screen shots, would like to see a file with these in.
 * was creation of species non specific isoform, typographical error in grouping ortholog isoforms?
 * next: see file with these IDs for protein and protein fragments to get in in sync about current status? any roadblocks? schedule?
 * DN: waiting for a few things:
 * representing type of cleavage products we need is not something PRO has done before, so trying to flesh out structure (screen shots = proposed structure) - still needs feedback.
 * EW: for isoform, there was an ortholog class created cross species for the parent - when it's not cross species ortholog class for isoform for human and mouse - incorrectly paired. the numbering doesn't follow any pattern (minor technical issue to correct).
 * DN is it truly incorrect?
 * AR breaks into discussion on isoforms and orthologs, looking at the file
 * PRO - take human naming as the pairing class.
 * DN: needs a comment on the structure
 * species non specific pairing,
 * species specific gene pairings,
 * isoforms that are subclasses of gene products and also derivesfrom relationship
 * DN - structure -
 * not species specific details -
 * the new part - derivesfrom - and the cleavage product.
 * the 2 derives from is for modeling isoforms that could come from 2 sources.
 * AR - may need to change derivesfrom OR - knows how to do in OWL
 * AR - next, needs the file filled out with uniprot references.
 * DN - needs file of formats of all APP forms and AlzForum, will send what he has now (1-2 cases) can give sequence forms by end of next week.

Complexes

 * PRO still fleshing out structure on complexes, can't give IDs just yet.
 * Test creating our first complex this week or next
 * EW has a candidate.
 * Shoot for end of next week

Version 8

 * TD asks for a description of what was chosen to go into V.8.
 * ortholog gene level of proteins from human and mouse from uniprot
 * where there are some experimental evidence for existence of the protein AND
 * some notion of what the protein is (i.e., did not put hypothetical protein, even if they know it exists, didn't dump)
 * gene level terms
 * all proteins that people would want to refer to b/c they know something about them, beyond that, will add by hand.
 * TD: difficulty loading PRO in OBOedit - DN uses version 1.101, others use version 2.
 * known path or proj timeline to v9
 * or next big import of a larger set of proteins (maybe we end up submitting)?
 * DN: v9 is already produced
 * could add bacterial terms or general under gene terms (phosphorylated terms).
 * TD: re: elaborating process on our side to do batch submissions to PRO - does that interact with plans for v9, or should we go ahead with our plans of assembling and sending as needed without worrying about what's in PRO's pipeline for versioning?
 * DN: no immediate plans to automate batch imports, only requirement is evidence.
 * TD action item - ID proteins in SWAN with uniprot identifiers, can now go back and trying to take that list - look at cleavage product
 * TC -> DN - do you have a database of PRO identifiers created to date? how do you create? repository / hand curated ?
 * 2 processes
 * automatic and manual
 * examine uniprot records, using a variety of filters, create PRO terms on a large scale.
 * all will be identified by [DN_x] in attribution.
 * have scripts that process this and creates terms, checking and edits, upload, publicize.
 * mapping file created, as well, all loaded into the PRO database (only public).

Discussion of PRO Broker

 * Broker + file format
 * work thru issues of file format
 * AR -> DN: thoughts on PRO giving us block of IDs.
 * DN: as long as we start them all with 9, would be fine. (block, assigned.)
 * Needs to be a protocol about when an ID is deprecated, accepted, etc, as well as a software layer that sits between us and PRO.
 * TD added text to the wiki re: his thinking on the "broker" service.
 * Suggested task: either we submit a .obo to PRO, they tell us ids.
 * Or vice versa. They give us ids, we submit a .obo file that uses some of them
 * AR to provide better OBO to OWL translation
 * Task -
 * verify that in our general background of gene resource, we can go from uniprot ID to entrez gene ID.
 * JAR to look at current Entrez Gene Translation and pull UniProt references.
 * Have information from IPI that may be a good starting point, collected by Dan.
 * action item: query against beta.neurocommons.org that takes uniprot and gives us entrez gene.
 * SD: if can't get a PRO ID, say it's a mappable uniprot ID that doesn't exist in the PRO bundle, then we shouldn't in that interface link that statement to the PRO provisional ID?
 * TD: I think we have to.
 * Would presume the broker - way you ask for info about ID assigned is that the ID returned is a HTTP url, when you do a GET on that URL, returns back to the broker.

SWAN conversion

 * series of meetings to implement swan 1.2 and drupal
 * use first results of conversion to pupulate Drupal
 * meeting beginning of february with NIF to implement.
 * need to have content for 1.2 soon.
 * AR does want to have definitions,
 * TC suggests getting definitions as applied by the annotators of SWAN
 * TD : should interview GW and EW about this.

Antibodies

 * Meeting with Don yesterday (PC and EW):
 * DH says latest form looks good.
 * PC knows he made some inferences about some of the antibodies.
 * there's evidence code in those cases saying that an inference has been made.
 * DH not an expert on heat shock proteins. thinks sequences about the epitopes can be made problematic.
 * PC. proposed solution.
 * this is not scalable, forgetting about automatic detection etc. at least to understand degree of accuracy we can reach, have expert in the field to evaluate (one may not even be enough)
 * TD can bootstrap from AlzForum.
 * Anything that DH has said is OK is represented as AlzForum's claims.
 * epitope descriptions are primary
 * one approach:
 * see protein and list of species, some may work, others may not
 * have to go to company's web site, see what they're claiming and model
 * ... and model all of these things. becomes very tedious.

= Action Items =


 * DN to do 2-3 cleavage products that show Uniprot ids etc - full details and correctness -by Tuesday, Jan 26
 * EW to have full list 2 weeks - start collating the list now
 * EW to send Darren candidate for creating complex
 * DN / AR - test creating first complex in PRO by end of next week.
 * TD to take relationships between things already represented protein complexes and cleavage products for second submission to PRO - to review 2 weeks from now.
 * JAR: entrez gene translation right now, grab uniprot references
 * AR to provide better OBO to OWL translation.
 * TD to pull together bi-monthly progress update - in the form of brief email digest (next Monday is the first)
 * point people - PRO = Alan; SWAN 1.2 = Paolo; Antibodies = TimD
 * TD do precision recall for automatic disambiguation, take small sample and check them