Search queries

Here you learn how to build your search queries. Please bear in mind that these queries are case sensitive, and this extends to all the syntax.

To build more advanced queries you can base on the Query Parser Syntax from Lucene

Search by string

Retrieve documents containing a specific string. Example: insulin

Use * as search query to retrieve all documents in a project.

Search by document ID (docid)

Each document in your project has a unique document ID (docid). This search query retrieves the file matching your given docid. You can also use modifiers such as a wildcard.

Example, search a unique document by docid: docid:bXYmSmlclyO01lP1kOLKSA0cyPAT-letter.txt

Example using the wildcard: docid:*-letter.txt or docid:bXYmSmlclyO01lP1kOLKSA0cyPAT*.

Search by filename

Retrieve all files matching some filename, possibly with a wildcard.

Example, search all your pdf files: filename:*.pdf

Example, search a specific file with spaces: filename:"My filename has some spaces.md"

Search filenames with spaces, special characters, and even emojis, just because you can! Example: filename:"Grammatikübersicht abc ! 🧡' ㊔.pdf"

Search by document label

Find documents tagged with specific label and value.

Boolean example: label:isSevere:true

Enum example: label:severity:high

String example: label:name:Lois

Range example: label:number_issues:[10 TO 20]

If you search for a value that contains spaces, surround it with quotes. Example: label:option:"option A"

Search by entity type

Retrieve all documents containing at least one entity that belongs to the given entity type. Example: entity:disease, retrieves all documents with at least one entity of the type disease.

If you add a term, e.g. entity:disease:cancer, you can find all documents containing at least one entity using that term.

Only by using the entity type id, you can also perform more advanced queries as:

count e.g. count_e_1:[2 TO *]): retrieve documents with at least 2 annotations of the type e_1.
norms_count_uniq e.g. norms_count_uniq_e_1:[2 TO *] retrieve documents with at least 2 annotations of the type e_1 that are normalized to different unique names (e.g Rezulin and Romozin - same diabetic drug sold under different commercial names - would be normalized to troglitazone, so it would count 1 unique entity normalized, not 2).

You can also use the entity type id. E.g. entity:e_1.

Search by normalization

Retrieve all documents containing at least one entity that normalizes to the given normalization. Example: entity:genes:HER2, retrieves all documents with at least one entity gene that normalizes to HER2.

Search by date

Retrieve all documents imported or updated in a given time frame.

created: documents imported in a given time frame. Examples: created:2018, created:2018-03, created:2018-03-06, created:[2013 TO NOW], created:[2016-12 TO 2017-02], created:[NOW-1DAY TO NOW] - documents imported since the previous day.

updated: documents updated in a given time frame. Examples: updated:2018, updated:2018-03, updated:2018-03-06, updated:[2013 TO NOW], updated:[2016-12 TO 2017-02], updated:[NOW-1DAY TO NOW] - documents updated since the previous day.

Search by folder

You have three possibilities to search by folder:

Search by folder index (folder:INDEX): the folders' indexes (integer numbers) are written in the project settings JSON. Take note of the folder's index you want to search for, and then search like folder:INDEX. For example, to search for the pool documents (special folder, always created), search like: folder:0.
Search by folder path (folder:PATH): for example, if the structure of your desired folder is pool > level1 > A, compose the folder path as in Unix: folder:pool/level1/A. Note that any leading or trailing /'s are discouraged, although accepted and ignored.
Search by folder name (folder:NAME): following the previous example, you could simply search by folder:A. In case you have different folders with the same name, the folder closest to the root level (the pool), that is, the folder less deep in the folder tree, will be found. For instance, if you had the folders pool/level1/A and pool/level1/level2/A, the former folder will be found. Caveat: in case you have different folders with the same name at the same level of the folder tree, one will be arbitrarily chosen and returned.

Searching by folder name is the easiest method. However, if you have different folders with the same name, you should search by folder index or folder path.

Searching by folder index is the most "robust" method for the indices do not change upon folder renamings.

Note that "pool" is always the root folder.

Search confirmed documents

You can search which documents are confirmed with query: anncomplete:true.

You can search which documents are not confirmed with query: anncomplete:false.

Here, a confirmed document means a document with the master version of the annotations confirmed.

Search which documents a user has confirmed

You can retrieve the documents a given member has confirmed with the query: members_anncomplete:username

You can also retrieve all the documents that have been confirmed by at least one member with the query: members_anncomplete:*

Create a query for a set of users following this example: members_anncomplete:user1 AND members_anncomplete:user2 AND members_anncomplete:user3

Find out on this tutorial how to rank & review your annotators with the members_anncomplete query.

Note that members_anncomplete searches in members versions only. If you rather want to search for a confirmed document in any version (i.e. any member or master), you need to search for: members_anncomplete:* OR anncomplete:true.
You can also search the opposite, that is, all documents not confirmed on any version yet, by negating the previous query, as in: -(members_anncomplete:* OR anncomplete:true), or just anncomplete:false AND -members_anncomplete:*.

Search which documents a user was assigned to

You can retrieve the documents distributed to a given member, with the query: members_assigned:username

You can also retrieve all the documents that have at least one assignee, with the query: members_assigned:*

You can combine the query fields with boolean logic, for example to find all documents allocated to two given users: members_assigned:user-A AND members_assigned:user-C

Wildcard search

To perform a single character wildcard search use ?. Example: entity:gene:P?2649.

To perform a multiple character wildcard search use *. Example: "Kepler-2*", "Kepler-4*c".

Tip: find all documents by just searching for *.

filter:TODO

The special search filter:TODO lists the documents that the logged user still has to annotate or review, if any. See Task Distribution and Annotation Flows.

Note that you cannot search the TODO list for other users; the filter is only available for the currently logged in user.

Fuzzy search

Find similar terms (string based search) based on the Levenshtein Distance, or Edit Distance algorithm. Use ~ at the end of a single word term. Example: roam~ will also find terms as foam.

You can fine tune the similarity level by adding, at the end, a number between 0 (less similar) and 1 (more similar). Example: roam~0.8.

The default similarity level is 0.5.

Proximity search

Finding words (string based search) within a specific distance away. Example: "diabetes insulin"~10, to search documents with the terms diabetes and insulin within 10 words of each other.

Boolean operators

Search queries can be combined using the operators AND, OR, NOT and -. Some examples:

entity:GGP AND entity:Mutation search for documents that contain GGP entities and Mutation entities.
"type 1 diabetes" OR insulin search for documents that contain "type 1 diabetes" or "insulin".
"type 1 diabetes" NOT insulin search for documents that contain "type 1 diabetes" but not "insulin". This operator cannot be used with just one term.
-entity:GGP search for documents that don't contain mentions of genes, i.e. GGP entities.

Remember to use upper case: AND, OR and NOT.

Escaping Special Characters

To escape these special characters use the \ before the character. For example to search for PD-L1 use the query: PD\-L1.

The current list of
special characters are + - ! " && || ( ) { } [ ] ^ ~ * ? : \