BioPAX-OBO/Cc-10-03-2009

Style
Suggestion (at least partially based on last time's discussion):

Set time limits to avoid open-ended discussion. Instead, have a brief and focussed exchange of ideas, then refer to a task force to look into the matter and report to the next meeting.

Topics
Suggestions:

- How can compounds (e.g. molecules) be identified?

- How much can a compound be changed before it turns into something different? (e.g. different foldings, addition of small groups) Should we allow a compound definition to be special cases of another? (e.g. "EGFR-Dimer, phosphorylated" is special cases of "EGFR-Dimer, regardless of phosphorylation") If we phosphorylate a protein, is it still "the same"?

- What is the role of categorizations (e.g. "protein")? Should categories be mutually exclusive?

- What is the role of attributes such as chemical structure or a sequence? Should we allow patterns (e.g. XGATTACA)?

- What is the role of catalogue references (e.g. "UniProt 1234567")?

- What is the role of parts (aka components)? Does it matter whether parts are bond covalently? Should we allow multiple competing partitions (e.g. "A and BC" versus "AB and C")? Should we allow patterns (e.g. "AR" with unknown "R")?

- Admin: relations with HCLS

scribe
Michel Dumontier

Minutes
1) How can compounds (e.g. molecules) be identified? atoms & connectivity - consensus that InCHI makes a good identifier

2) How do we define reactions? forall reaction r, there exists some substrate a and some product p

we have worked on this idea previously, using roles

2) How do we define proteins?

Serious disagreement here, PRO is doing a massive undertaking of manual curation to make records for proteins with different PTMs / evolutationary histories. Some believe that this can be fully automated and there is no need for manual intervention, except in the description of the qualitative attributes (e.g. fold types, PTM types, etc). We don't want another ChEBI of manual curation of chemical structure in what should be a fully automated process. Suggestion to develop a InCHI like descriptor for proteins that considers i) sequence, ii) modifications, and possibly other conformational attributes such as secondary/tertiary structure, fold, etc
 * PRO (http://pir5.georgetown.edu/wiki/PRO)
 * InCHI (http://www.iupac.org/inchi/release102final.html)

Action Items


 * get a better understanding of PRO (all)
 * generate a set of potential descriptors (michel, andrea, elgar)

Skype conference call minutes - [3/10/2009 4:11:58 PM] Andrea Splendiani: http://neurocommons.org/page/Cc-10-03-2009 [3/10/2009 4:12:05 PM] Andrea Splendiani: ^^^^ [3/10/2009 4:12:07 PM] Andrea Splendiani: Agenda [3/10/2009 4:15:11 PM] *** Conference call, duration 1:11:36 *** [3/10/2009 4:17:10 PM] Alan Ruttenberg: are we talking sameness of instances? [3/10/2009 4:17:31 PM] Alan Ruttenberg: Because a class can group any set of instances with shared properties [3/10/2009 4:18:51 PM] Alan Ruttenberg: atoms and connectivity [3/10/2009 4:20:03 PM] Alan Ruttenberg: classes: a + b -> c [3/10/2009 4:20:28 PM] Alan Ruttenberg: forall a, there exists some b that reacts with that a to produce some c [3/10/2009 4:21:54 PM] Alan Ruttenberg: same number and kind of atom [3/10/2009 4:21:54 PM] Alan Ruttenberg: same set of bonds [3/10/2009 4:22:54 PM] Michel Dumontier: forall reaction r, there exists some a and b that react to produce some c [3/10/2009 4:24:59 PM] Alan Ruttenberg: when you have a glucose and add a phosphorus, you get something of a different class, in most class definitions of small molecules [3/10/2009 4:25:20 PM] Alan Ruttenberg: For proteins this may or may not be true. You tell me. [3/10/2009 4:26:58 PM] Alan Ruttenberg: You are focused on the wrong thing. Focus on making statements that are true. Then, if you want to express this in terms of classes, make sure your class definitions are such that the statement stays true. [3/10/2009 4:27:12 PM] Michel Dumontier: +1 [3/10/2009 4:29:35 PM] Michel Dumontier: InCHI [3/10/2009 4:29:48 PM] Alan Ruttenberg: https://pir5.georgetown.edu/wiki/PRO [3/10/2009 4:29:50 PM] Michel Dumontier: is an accurate description of chemical structure [3/10/2009 4:29:56 PM] Alan Ruttenberg: inchi for proteins too complicated [3/10/2009 4:30:10 PM] Alan Ruttenberg: for "small molecules" use Chebi [3/10/2009 4:33:13 PM] Alan Ruttenberg: 1 and not 2, 1 and 2 [3/10/2009 4:33:16 PM] Alan Ruttenberg: lost connection [3/10/2009 4:33:31 PM] Alan Ruttenberg: your class would be union of those two [3/10/2009 4:33:50 PM] Alan Ruttenberg: EGFR+1-2, EGFR+1+2 [3/10/2009 4:33:54 PM] Oliver Ruebenacker: e.g. every EGFR that has phospho at site 1, may or may not phospho at site 2? [3/10/2009 4:34:10 PM] Alan Ruttenberg: unionof(EGFR+1-2, EGFR+1+2) [3/10/2009 4:34:33 PM] Andrea Splendiani: PRO has mappings to 1244 Uniprot entries [3/10/2009 4:36:06 PM] Oliver Ruebenacker: PRO database, ontology, or both? [3/10/2009 4:36:18 PM] Andrea Splendiani: ontology [3/10/2009 4:36:59 PM] Oliver Ruebenacker: Can you use PRO to define your own IDs? [3/10/2009 4:40:15 PM] Andrea Splendiani: http://www.berkeleybop.org/ontologies/ [3/10/2009 4:40:59 PM] Alan Ruttenberg: how will you represent fold? [3/10/2009 4:41:05 PM] Alan Ruttenberg: question for michel [3/10/2009 4:41:19 PM] Alan Ruttenberg: since proteins can fold in different forms [3/10/2009 4:42:07 PM] Michel Dumontier: http://www.iupac.org/inchi/release102final.html [3/10/2009 4:43:08 PM] Alan Ruttenberg: stepping away for a few minutes [3/10/2009 4:43:11 PM] Michel Dumontier: k [3/10/2009 4:45:54 PM] Alan Ruttenberg: you are not talking about a specific one. [3/10/2009 4:45:58 PM] Alan Ruttenberg: you are talking about classes still [3/10/2009 4:46:15 PM] Alan Ruttenberg: the "specific ones" have distinct momenta and positions [3/10/2009 4:46:21 PM] Alan Ruttenberg: at least [3/10/2009 4:47:12 PM] Alan Ruttenberg: never the "same" protein [3/10/2009 4:47:17 PM] Andrea Splendiani: Definition from PRO: label A voltage-gated potassium channel subunit KCNQ1 that is a translation product of a polymorphic sequence variant of the KCNQ1 gene that has a Ile residue at the position equivalent to Thr-391 in the human sequence UniProtKB:P51787. This residue is located in the cytoplasmic C-terminus.@en [3/10/2009 4:47:30 PM] Alan Ruttenberg: always "whether thing is in class" [3/10/2009 4:49:42 PM] Alan Ruttenberg: an inchi would typically be overspecific for proteins [3/10/2009 4:54:23 PM] Alan Ruttenberg: hard to hear. skype sucks :( [3/10/2009 4:54:32 PM] Oliver Ruebenacker: we need an ID for all alcohols, right? [3/10/2009 4:54:51 PM] Oliver Ruebenacker: i can not hear anything right now [3/10/2009 4:55:01 PM] Andrea Splendiani: If I understood Alan, inchi encodes informations on specific modifications as if they are the only ones. [3/10/2009 4:55:14 PM] Alan Ruttenberg: can't hear anything either [3/10/2009 4:55:20 PM] Andrea Splendiani: Me neither. [3/10/2009 4:55:29 PM] Michel Dumontier: if you want a generalized string: [3/10/2009 4:55:29 PM] Michel Dumontier: http://en.wikipedia.org/wiki/Simplified_molecular_input_line_entry_specification [3/10/2009 4:55:31 PM] Andrea Splendiani: I have to go in 5 min. as well. [3/10/2009 4:55:40 PM] Oliver Ruebenacker: if we could generate an ID for "any alcohol", we could say it's a subclass of it. [3/10/2009 4:55:47 PM] Michel Dumontier: queries on chemical structure can be captured in smiles strings [3/10/2009 4:55:50 PM] Oliver Ruebenacker: any tasks to dsitribute? [3/10/2009 4:55:50 PM] Alan Ruttenberg: inchi tries to specify all the atoms, bonds, and geometry of a small molecule. [3/10/2009 4:56:00 PM] Alan Ruttenberg: thus each inchi names a class of molecules [3/10/2009 4:56:09 PM] Alan Ruttenberg: why a class? Because you can [3/10/2009 4:56:20 PM] Alan Ruttenberg: have two molecules with different excitations states that have the same inchi [3/10/2009 4:56:23 PM] Oliver Ruebenacker: if we had an ID for patterns, e.g. R-COH, we could use that [3/10/2009 4:56:32 PM] Alan Ruttenberg: or two different values for angular momentum [3/10/2009 4:56:45 PM] Michel Dumontier: yes, R-COH can be represented with SMILES [3/10/2009 4:56:59 PM] Alan Ruttenberg: You still need to say what the instances of R-COH are [3/10/2009 4:57:12 PM] Oliver Ruebenacker: Michel, maybe you can develop a toy prototype and present next time? [3/10/2009 4:57:12 PM] Andrea Splendiani: Is the expressivity required to express the structure of chemical sufficient for proteins ? [3/10/2009 4:57:17 PM] Michel Dumontier: yes, and this can be automatically determined [3/10/2009 4:57:26 PM] Michel Dumontier: please refer to my chemical functional group papers [3/10/2009 4:57:30 PM] Alan Ruttenberg: we don't know the fold of most proteins [3/10/2009 4:57:31 PM] Alan Ruttenberg: so no [3/10/2009 4:57:42 PM] Michel Dumontier: the fold is a descriptor [3/10/2009 4:57:46 PM] Alan Ruttenberg: ? [3/10/2009 4:57:46 PM] Michel Dumontier: not something that defines identity [3/10/2009 4:57:51 PM] Alan Ruttenberg: says who [3/10/2009 4:57:53 PM] Michel Dumontier: it is a qualitative attribute [3/10/2009 4:57:58 PM] Andrea Splendiani: depends... [3/10/2009 4:58:02 PM] Alan Ruttenberg: at the instance level proteins with different folds act differently [3/10/2009 4:58:12 PM] Alan Ruttenberg: at the instance level proteins with different sequences act differently [3/10/2009 4:58:17 PM] Oliver Ruebenacker: michel, which papers? can you put a link on the wiki? [3/10/2009 4:58:20 PM] Michel Dumontier: sure [3/10/2009 4:58:22 PM] Alan Ruttenberg: why is one a descriptor and another an identity criteria [3/10/2009 4:58:27 PM] Alan Ruttenberg: this is arbitrary [3/10/2009 4:58:31 PM] Michel Dumontier: no it isn't [3/10/2009 4:58:32 PM] Alan Ruttenberg: sooner you know it, the better [3/10/2009 4:58:38 PM] Michel Dumontier: we already made the distinction for small molecules [3/10/2009 4:58:40 PM] Michel Dumontier: it is the same [3/10/2009 4:58:44 PM] Michel Dumontier: atoms + connectivity = identity [3/10/2009 4:58:48 PM] Alan Ruttenberg: that is one distinction for small molecules [3/10/2009 4:58:54 PM] Michel Dumontier: and proteins ARE molecules!!!! [3/10/2009 4:58:55 PM] Alan Ruttenberg: you can't talk about how lasers work with inchi [3/10/2009 4:58:56 PM] Michel Dumontier: lol [3/10/2009 4:58:59 PM] Andrea Splendiani: It's different for proteins. [3/10/2009 4:59:03 PM] Alan Ruttenberg: "one distinction" [3/10/2009 4:59:05 PM] Michel Dumontier: its not different [3/10/2009 4:59:12 PM] Michel Dumontier: that is an arbitrary distinction [3/10/2009 4:59:18 PM] Alan Ruttenberg: appropriate for certain types of statements, not for others. [3/10/2009 4:59:28 PM] Andrea Splendiani: I suspect two proteins could be consdered the same if they have a negligible difference in sequence. [3/10/2009 4:59:35 PM] Alan Ruttenberg: really? [3/10/2009 4:59:45 PM] Alan Ruttenberg: haven't you heard of snps and disease? [3/10/2009 4:59:45 PM] Oliver Ruebenacker: what is "negligible"? [3/10/2009 5:00:01 PM] Michel Dumontier: and SNPs leads to a change in SEQUENCE [3/10/2009 5:00:02 PM] Oliver Ruebenacker: maybe you can develp a definition for "negligible" and present next time? [3/10/2009 5:00:12 PM] Andrea Splendiani: That doesn't result in an effect for which they can be considered distinct. [3/10/2009 5:00:13 PM] Alan Ruttenberg: michel, that's what andrea said [3/10/2009 5:00:28 PM] Alan Ruttenberg: we "consider" [3/10/2009 5:00:31 PM] Alan Ruttenberg: proteins don't [3/10/2009 5:00:34 PM] Andrea Splendiani: I was somehow asking... [3/10/2009 5:00:41 PM] Michel Dumontier: ? [3/10/2009 5:00:55 PM] Alan Ruttenberg: I was commenting on andreas phrase "considered distinct." [3/10/2009 5:01:18 PM] Michel Dumontier: fundamentally, different chemical strucutre = different entity [3/10/2009 5:01:28 PM] Andrea Splendiani: Yes. [3/10/2009 5:01:39 PM] Alan Ruttenberg: but the converse is not true [3/10/2009 5:01:46 PM] Alan Ruttenberg: same chemical structure != same entity [3/10/2009 5:02:40 PM] Alan Ruttenberg: not even same "type" [3/10/2009 5:02:42 PM] Oliver Ruebenacker: do we care about chemical structure for proteins? [3/10/2009 5:02:45 PM] Alan Ruttenberg: sometimes [3/10/2009 5:02:48 PM] Andrea Splendiani: See the definition from PRO. There is no reference to a sequence... [3/10/2009 5:02:49 PM] Michel Dumontier: if they have the same chemical structure, they share this identity attribute [3/10/2009 5:03:02 PM] Alan Ruttenberg: if you get down far enough in pro they will xref sequence [3/10/2009 5:03:11 PM] Alan Ruttenberg: what's an attribute? [3/10/2009 5:03:22 PM] Andrea Splendiani: But it's a canonical one. [3/10/2009 5:03:27 PM] Alan Ruttenberg: what's canonical? [3/10/2009 5:03:29 PM] Oliver Ruebenacker: what's the chemical structure of an EGFR dimer? ;) [3/10/2009 5:03:52 PM] Alan Ruttenberg: as the term is commonly used, many different structures [3/10/2009 5:04:04 PM] Michel Dumontier: there is no doubt that if you want to specify a class of EFGR dimers that are located in the plasma membrane and have 3 specific phosphorylation sites, you should be able to do this [3/10/2009 5:04:06 PM] Andrea Splendiani: Ok, maybe I have misinterpreed PRO, have to check. [3/10/2009 5:04:06 PM] Michel Dumontier: but [3/10/2009 5:04:07 PM] Alan Ruttenberg: because often the term is used without specification of species [3/10/2009 5:04:19 PM] Elgar Pichler: from PRO website: "Aim 1. Develop a Protein Evolution (ProEvo) ontology to describe proteins based on evolutionary relationships. In essence, ProEvo will reflect protein families (using sequence or structure similarities) in an ontology framework." [3/10/2009 5:04:23 PM] Andrea Splendiani: So... I have to leave... [3/10/2009 5:04:24 PM] Alan Ruttenberg: and there are things that can be said independent of specieis [3/10/2009 5:04:27 PM] Alan Ruttenberg: ok. [3/10/2009 5:04:33 PM] Michel Dumontier: is that entity equivalent to one that is nascently folded from teh ER and doesn't have any modifications? [3/10/2009 5:04:33 PM] Elgar Pichler: sounds like sequence matters [3/10/2009 5:04:38 PM] Andrea Splendiani: Can we have a quick wrap-up for me ? [3/10/2009 5:05:01 PM] Oliver Ruebenacker: andrea, want to develop a definition of "negligibly different proteins"? [3/10/2009 5:05:17 PM] Andrea Splendiani: I suggest we take a view at PRO as an action item for next item, and comment on its implication on identity. [3/10/2009 5:05:33 PM] Andrea Splendiani: Maybe, I'll first have a look in PRO. [3/10/2009 5:05:52 PM] Oliver Ruebenacker: so, every one takes a look at PRO? [3/10/2009 5:06:19 PM] Michel Dumontier: ok, it is an action item then [3/10/2009 5:06:22 PM] Andrea Splendiani: ok. [3/10/2009 5:06:26 PM] Oliver Ruebenacker: ok [3/10/2009 5:06:34 PM] Andrea Splendiani: And possibly address the questions posed by Oliver. [3/10/2009 5:06:55 PM] Andrea Splendiani: Now I'm sorry but really have to rush! (hope to see notes on the wiki ;) ) [3/10/2009 5:07:06 PM] Michel Dumontier: i am editing the wiki minutes! [3/10/2009 5:07:07 PM] Alan Ruttenberg: bye [3/10/2009 5:07:08 PM] Oliver Ruebenacker: andrea, see you later! [3/10/2009 5:07:10 PM] Andrea Splendiani: ciao! [3/10/2009 5:07:30 PM] Andrea Splendiani: (will leave comp on) [3/10/2009 5:26:50 PM] *** Call ended ***