Semantic resources project/Meeting notes/2009-10-1

Notes from October 1 meeting

Taken by: Kaitlin Thaney

Attendees: Jonathan Rees, Paolo Ciccarese, Kaitlin Thaney, Tim Danford, Tim Clark, Elizabeth Wu Data:

(1) Scraping AlzForum (or not)
 * No need to scrape. EW to send files to TD.

(2) Alan's contact @ Abcam


 * AR: Does have contact, but it is a question of timing.
 * TC: doesn't agree with him, not sure how long to get permission. Thinks would be useful to just scrape to save time.
 * Would be scraping their data sheets to help fill out the same model we have for AlzForum.
 * JAR: when and how to make the choice between scraping and negotiation.
 * Don Hatfield has also included a bit of Abcam data.
 * AR: Background context on Abcam -- Hold data rather closely, some things they have on their data sheets, not going to give us. What we've been negotiating is releasing neuroscience portion on their database. Originally what Frank Gibson wanted to give was just the link between protein to the antibody which might be helpful, but not very - how to represent epitopes, etc. Wants to be brought back into the loop soon, expecting show of progress and requirements as for what we'd like. He's aware of our cage - Best strategy - show examples of some things we'd like to get from him. No guarantee, but worth trying first. Another company to consider - Cell Signaling Technologies - that may be more amenable to giving information.

(3) Supplier data questions
 * How do we come up with new URI for supplier, look up existing URIs etc
 * PC has been scraping list of suppliers, surprisingly very little modeling involved (contact information, phone numbers). Did notice that not all the information is fresh. Some links are not working any more, point to other companies due to mergers, etc. How to validate?
 * Idea - if goal is to ID the supplier, pointer to the Web site may be best. Location data may not make sense due to multiple offices, changes, etc.
 * EW: thinks name + web site may be most efficient.
 * PC: Only way can work with locations, phone numbers etc. - put up something that allows companies to manage their data - would add visibility etc. Not sure it's in the scope of this project, and will take time to put up system, get buy in, etc.
 * Agreement: Supplier info - name + URL - wont worry about modeling phone numbers etc, AlzForum, as long as scraping is agreeable

(4) Scraping JAX


 * TD able to scrape the mouse model data sheets.
 * Going to circle up with Judy at the end of this week - can we get data sheets internally or should we scrape? Can now get JAX stock number and strain name - useful for modeling purposes, but we want the full data sheets.
 * EW to also send AlzForum table of research models.

(5) URI's:


 * TD: develop programmatic access model for uniprot by protein model - will be useful for mouse models and antibody models
 * JAR: to write up blurb for best practices for OBI for supplier models.
 * TD: talk to JAR about technical specifics for templating (in Python)
 * TD: to come up with replicable SPARQL process for pulling out PubMed IDs given article metadata (EW suggests title + first page #)

(6) Software Architecture


 * Software Component Diagram (photo to be sent)
 * Installation of Virtuosos holding NC bundles in EC2 instance
 * Not quite sure how other pieces fit in, all the way up to SWAN/SCF integration. What we build needs to be integrated into SCF, but doesn't need to be *first built* in SCF, since it presents it's own unique issues
 * Alan to point TD to SPARQL protocol

(7) Contact Information / Use cases


 * TC: suggest TD interview Elizabeth and Gwen to make resources useful to SWAN and SCF users.
 * (1) Judy Blake
 * AR: more interesting work is in the modeling, very little in the scraping. Should not see the scraping data as a blocking point
 * TD - has done some scraping of JAX already.
 * (2) Keith Robison (use case development) - computational biologist from MLMN
 * TD meeting with KR this afternoon (10/1/09)
 * (3) Meeting notes with JHC - MIND and CHDI (use case development)
 * Met with his student and tech.
 * TD to start distill suggestions, new resources / processes / way to interact with resources, possible workflows,

(7) Pathway data file


 * EW action item from earlier this month. Just to revisit. Anyone have long term suggestions?
 * AR: Short term suggestions - go through, extract proteins and protein fragments mentioned, can do without parsing the whole thing. Can add to PRO.
 * TD: investigate structure of file, if easy to do, will try to do and post the list.
 * EW: also annotating paper associated with the paper, so will have hypotheses.

(8) Draft outline of report
 * KT is on it.

(9) Action Items


 * EW: Provide TD with AlzForum data.
 * EW: Send table of research models (may be all mice) from AlzForum, as well
 * AR: Contact with Abcam - assess how long it will take and next steps.
 * TD: Ping JAR/Alan and develop programmatic access model for Uniprot by protein model
 * TD: talk to JAR about technical specifics for templating (in Python
 * TD: to come up with replicable SPARQL process for pulling out PubMed IDs given article metadata (EW suggests Title + first page #)
 * TD to ping JAR and Alan for SPARQL protocol
 * PC/ JAR / Alan to work through and review IMGT papers - catalogue what they have
 * TD: run transitive closure on EC2 instance already up
 * TD: get from Stephane SCF SPARQL queries to set up test suite to make sure supported in the future
 * TD: make sure on Stephane's queue to make sure a development instance of SCF - not just PDonline - is set up
 * TD: Establish plan on when to implement, meet requirements, etc. outlining steps between now and when SCF instance runs independently (so to speak) - testing and reliability, plan for updating - how going to work, schedule, etc.
 * JAR: Provide architecture description
 * AR to choose selection of 3-4 methods, names, text from AlzForum, Wikipedia link from wiki page - and AR will try to abstract out definition. Use as example from that as a model for how to move forward.
 * AR and EW to talk more about reasons for needing other definition instead of using AlzForum's
 * KT: Draft of report, due DEC 1

Adjourned