Semantic resources project/Antibody Record Annotator

= Annotation Software =

= Installation Instructions =

Download the Required Files
Before you can run the annotation tool, you’ll need to download the software itself as well as several data files provided by 3rd parties.

annotations.jar

AlzForum Data
You’ll need to download the AlzForum antibody dataset – since this isn’t a public dataset, this is provided via email.

PIR
You’ll need to download the PIR files distributed as part of the bioThesaurus project. 1.	bioThesaurus.dist_#.#.gz 2.	Dictionary

The first of these files can be located on the PIR FTP server,

ftp://ftp.pir.georgetown.edu/databases/iprolink/

where you should download the .gz file of the latest version of the BioThesaurus (version 6.0 as of this writing).

The Dictionary file can be found at:

http://pir.georgetown.edu/pirwww/iprolink/Dictionary.gz

and should be placed the in the same folder as the bioThesaurus file above.

PRO
You’ll also need to download the PRO files:
 * 1) pro.obo
 * 2) uniprotmapping.txt

Both can be downloaded from the PRO FTP server,

ftp://ftp.pir.georgetown.edu/databases/ontology/pro_obo/

These files should also be put in the same folder as the PIR resources, above.

Finally, you will need to download the AlzForum Antibody Data dump. This isn’t public, so you’ll need to get this file from me directly. It should also be placed in the same folder as the other files above.

Startup and Installer Window
To run the Annotation tool, you will need to run the JAR file you downloaded in the section above. This is typically done from the command line by navigating to the folder which holds the annotations.jar file you downloaded above, and executing the command:

java –Xmx800m –jar annotations.jar

The “-Xmx800m” option gives the Java virtual machine a maximum of 800 Mb of memory. You should give it at least this much, as the annotation tool holds a lot of data in memory (giving it even more memory might also be nice, too).

To the right is a picture of the window you first see when you run the software. This window allows you to select the locations of the files you will need to run the annotator, and will save these selections so that you don’t have to re-locate the same files each time you run the software.

The four files you will need to run the annotator are:
 * 1) pro.obo
 * 2) uniprotmapping.txt
 * 3) antibodies.rdf
 * 4) a “protein index”

The first two of these files were presumably downloaded as you followed the instructions above – simply click the buttons next to those file names at the top of the install box, and choose the location of each file. You should see the path to the file replace the “Unknown” text after you have chosen each file.

The second two files, antibodies.rdf and the protein index, will need to be created from files you downloaded above. Use the two framed areas, labeled “antibodies.rdf Creation” and “Protein Index Creation,” to create these files.

antibodies.rdf

To create the antibodies.rdf file, use the file choice button to select the location of the AlzForum antibody data file that you downloaded in the section above. Once you have located this file, the “Create antibodies.rdf” button will enable itself – click this button to create the antibodies.rdf file. After this file is created, the button for the antibodies.rdf file location above will automatically have filled in the location of this file.

Protein Index
To create the Protein Index, follow a similar procedure: use the file locator buttons to choose the locations of the bioThesaurus and Dictionary files, and then use the “Create Protein Index” button to create the protein index. Note: creating the protein index can take a long time, 30-60 minutes! However, it only needs to be created once, and can then be reused every other time you start the annotation tool.

Annotator Name
Finally, type in the name of an annotator that will be used when running the software. Currently available annotators are: Type in one of these four names into the “Annotator” field.
 * 1) tdanford
 * 2) paoloc
 * 3) sudeshna_das
 * 4) alanruttenberg

Once you have correctly entered the locations of all four files, you will need to save the choices you have made to a startup file. Click the “Create Startup File” button, which will create a file in your current directory with all the options you have selected saved in it.

After you have created a startup file, the “Run Annotation Tool” button will be enabled. Click this button to run the annotation software.

In the future, if you run the software from the same directory, the startup file will be located and loaded automatically – you won’t need to choose or create any files, and you can simply click the Run Annotation Tool button immediately to start the annotation software.

Running the Annotator Software
The main annotation window contains three panes, indicated in the figure to the right as #1, #2, and #3. Pane #1 is the “data view pane” – this displays a view of the current data item (the current antibody) that is being annotated. Each field for the data item is shown using a gray title followed by black text for the value of the field.

Data View Pane
The primary way in which a data item is annotated is by using the “Scan” item in the “Edit” menu. This will automatically search all fields of the data item, and identify (in red) any annotations that the built-in lookup functions tentatively identify.

Once annotations have been added to the data item view pane, the text that they annotate is highlighted in red. This means that the annotation has been added, but has not been judged. Every annotation must be manually reviewed and then either judged as correct (‘accepted’) or judged as incorrect (‘rejected’). An annotation can be judged multiple times by different annotators or by the same annotator; the system always takes the latest judgment as definitive. Once a judgment has been accepted, the color of its text is shown in blue (the text of rejected annotations is displayed in light gray).

To judge an annotation, double click on the text of the annotation in the data view pane. Once you’ve double clicked, the complete set of annotations associated with that point in the text will appear in Pane #2, the ‘annotation view pane.’

Annotation View Pane
Pane #2 is the Annotation View Pane, and shows a list of selected annotations. Each item in the list is the value of the annotation, and is followed by links indicating what actions can be performed on the annotation. In the figure above, the annotation for the blue text “APP” on the left has been double clicked, and the corresponding annotation has appeared on the right. This annotation has the value “PRO:00004168”, indicating the PRO identifier which has been added and accepted as an annotation for the text “APP.”

Possible actions that can be taken for an annotation are:
 * 1) Accept: judges the annotation as “accepted” by the current annotator.  Note that annotations that are currently accepted will not show an additional ‘Accept’ action.
 * 2) Reject: judges the annotation as “rejected” by the current annotator.  Note that annotations that are currently rejected will not show an additional ‘Reject’ action.
 * 3) Expand: displays more information about the annotation, including which field the annotated text is in, exactly which text in that field is annotated, the name of the annotator who created the annotation, and any ‘qualifiers’ which indicate the type and origin of the annotation.   Note that the ‘Expand’ action is not displayed if the current annotation is already expanded.
 * 4) Collapse:  returns the annotation back to the single line-item state from the expanded state.
 * 5) Match: the Match action allows the user to create a new annotation which indicates the same value as the annotation the match action is performed on.  To use the Match action, highlight (in the data view pane) the text you wish to annotate, and then choose the Match action of the annotation you want to match to that text.  A new annotation will be created on the highlighted text, with qualifiers, annotator name, and annotation value that reflect the original matched annotation.

To help the annotator make judgments about the data item, additional links may be displayed for an annotation in the “comment” field displayed when the annotation is expanded. Clicking on these links will open your default web-browser and send it the URL associated with the link. (Currently, built-in links are enabled for UNIPROT identifiers that are associated with PRO protein annotations.)

Furthermore, the “View” menu contains additional options for using your web-browser to see information about the data item. During antibody annotation, the View menu contains items to allow browsing of both the original manufacturer datasheet and the AlzForum antibody entry. If the URLs for those resources can’t be determined, the menu item will be grayed out. Choosing one of these menu items will send your default web-browser to the corresponding online data page. (Some manufacturer datasheets are no longer available at the URL stored in the AlzForum dataset, so these pages may return HTTP errors.)

Lookup Dialog
In some cases, the automatic “Scan” option will miss annotations. To add an annotation manually, highlight the text you wish to annotate and then click the “Lookup” button at the bottom of the Data View Pane. This will open a dialog box as shown below.

In the “Lookup:” field at the top of the dialog, type in the text you wish to use to search for the corresponding annotation. By default this field is populated with the highlighted text from the data view pane, but this can be changed if necessary (for example, if the text in the data view pane contains a typo or other mistake). Once you’ve entered the text to be looked up in the edit field, choose one or more lookup dictionaries to be used in creating the annotation by selecting one or more checkboxes below. In the example shown above, we are searching for the text “APP” using the protein name lookup function.

Once you’ve entered the text you want to look up, click the “Lookup” button to send the text to the chosen lookup functions. Any annotations returned by the lookup function will then be used to populate the Annotation View Pane.

Notice that these annotations which were added manually have a new action: Add. A manual annotation must be added, before it can either be accepted or rejected; this prevents a mistaken lookup query from resulting in an excessive number of annotations attached to a piece of text, which must then all be rejected.

The Lookup dialog box also has a “Uniprot Search” button – since so many annotations are expected to be protein annotations, this is a shortcut for helping find the correct proteins. Clicking the Uniprot Search button will not create any annotations, but will simply take the text in the edit field and use it to start a search on the Uniprot website using your web-browser.

Term Request Pane
If you find a piece of text which is not recognized by an existing lookup method, but should be recognized, you can flag this text as a term request.