Querying across free text fields such as report bodies using relational methods (e.g.
like) is computationally intensive and performance is poor in larger data sets (> 100,000 records) performance is very poor. Bridge provides a way to search using a free text index built on Apache Lucene and integrated into the SDK.
Building a search of the free text index requires a
FullTextEntityManager and query parser. Getting a
FullTextEntityManager is as simple as running
Java::HarbingerSdk::DataUtils.getFullTextEntityManager. To create a search query parser use the
Java::HarbingerSdk::Search::search_query method. Pass in the name of the index to search (ex:
reportBody) and the search string. This query parser gives common search operators such as boolean syntax.
# A method to search with a given string input and a default limit of 10 def search(string,limit=10) fm = Java::HarbingerSdk::DataUtils.getFullTextEntityManager begin phrase_query = Java::harbinger.sdk.Search::search_query("reportBody", string) rescue Java::OrgApacheLuceneQueryParser::ParseException => e return "Error while parsing search string" end ftquery = fm.createFullTextQuery(phrase_query) # uncomment if you would like the lucene score back with the RadReport object ftquery.setProjection(Java::org.hibernate.search.jpa.FullTextQuery::THIS, Java::org.hibernate.search.jpa.FullTextQuery::SCORE) reports = ftquery.setMaxResults(limit).getResultList() fm.close() return reports end # Usage: search("pnemonia") # search("pneumonia AND pneumothorax")
Best Practice - When searching from user input, ensure an application can handle failed parsing. The exception that will be thrown for a parser error is
The return of the method above will yield an
RadReport objects. Use the same methods/accessors on these objects as a relational database.
A search query is broken up into
term is a single word such as "lung" or "nodule". A
phrase is a group of words surrounded by double quotes such as "lung nodule". Multiple terms can be combined together with
operators to form a more complex query.
Tip - The
searchmethod created in the example above assumes the "search" parameter that is passed in is already formatted in the form required by the search query. You may wish to simplify or hide the required syntax in the application user interface to make it more accessible.
Terms can be combined through logic operators. The search query supports
- as operators (Note: operators must be ALL CAPS).
OR operator is the default conjunction operator. This means that if there is no operator between two terms, the
OR operator is used. The
OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol
|| can be used in place of the word
To search for documents that contain either "lung nodule" or just "lung" use the query
"lung nodule" lung or
"lung nodule" OR lung.
AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol
&& can be used in place of the word
AND. To search for documents that contain "lung nodule" and "chest" use the query
"lung nodule" AND "chest".
+ (plus) Operator¶
+ or required operator ensures that the term after the
+ symbol exist somewhere in the document. To search for documents that must contain "lung" and may contain "chest" use the query
NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol
! can be used in place of the word
NOT. To search for documents that contain "lung nodule" but not "chest" use the query
"lung nodule" NOT "chest". Note: The
NOT operator cannot be used with a single term. For example, the following search will return no results:
NOT "lung nodule"
- (minus) Operator¶
- or prohibit operator excludes documents that contain the term after the
- symbol. To search for documents that contain "lung nodule" but not "chest" use the query
lung nodule" -"chest"
The search query supports using parentheses to group clauses into sub queries. This can be very useful to control the boolean logic of a query. To search for either "lung" or "nodule" and "chest" use the query
(lung OR nodule) AND chest. This eliminates confusion and ensures that chest must exist and either the terms lung or nodule may exist.
The search query supports finding words within specified distance of each other. To perform a proximity search, use the tilde,
~, symbol at the end of a phrase. To search for a "nodule" and "lung" within 10 words of each other in a document use the search
The search query supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the
? symbol. To perform a multiple character wildcard search use the
* symbol. Note: You cannot use a * or ? symbol as the first character of a search.
- The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search
- Multiple character wildcard searches looks for 0 or more characters. To search for test, tests or tester, you can use the search
- You can also use the wildcard searches in the middle of a term such as
The search query provides the relevance level of matching documents based on the terms found. Boosting allows you to control the relevance of a document by boosting its term. The higher the boost factor, the more relevant the term will be. To boost a term use the caret,
^, symbol with a boost factor (a number) at the end of the term you are searching.
For example, if you are searching for
lung nodule and you want the term "lung" to be more relevant boost it using the
^ symbol along with the boost factor next to the term such as
lung^4 nodule. This will make documents with the term "lung" appear more relevant. You can also boost Phrase Terms as in the example
"lung nodule"^4 "chest".
By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).
Escaping special characters¶
It is possible to escape operators such as
( using a
\, but punctuation is not captured within the index, though they remain part of the document body.
Further details about querying parsing are available in the lucene documentation.
In addition to the
reportBody index, there are other fields that can further refine a search. While all fields are represented within Lucene as strings, they are formatted to take advantage of their values. Below is a list of all the fields, their coresponding schema locations, and a description which will include any special formatting considerations.
|Index Name||Schema Location||Description|
|radExam.relativePatientAge||The age of the patient at the time of the exam (end exam) in years|
Within the SDK there are static methods to help build these indexes into a filter for the search query. They are:
Java::HarbingerSdk::Search::term_range_filter(index_name, start_date_string, stop_date_string)- Filtering a date formatted index between two formatted date strings (YYYYMMDD).
Java::HarbingerSdk::Search::term_values_filter(index_name, [array_of_value_strings])- Filtering an index based on a list of possible values.
Java::HarbingerSdk::Search::term_filter(index_name, value_string)- Filtering an index based on a single string value.
Java::HarbingerSdk::Search::numeric_range_filter(index_name, start_number, stop_number)- Filtering a number formatted index based on two numbers.
Each can be combined into a set of filters with these boolean operators:
Java::HarbingerSdk::Search::and_filters([array_of_filters])- Join an array of filters with an
Java::HarbingerSdk::Search::or_filters([array_of_filters])- Join an array of filters with an
These methods can be found in the API documentation under the
An example combining these filters into a search term on
reportBody by the
rad_reports.report_event date for a specific procedure:
entity_manager ||= Java::HarbingerSdk::DataUtils.getEntityManager full_text_entity_manager ||= Java::HarbingerSdk::DataUtils.getFullTextEntityManager(entity_manager) # This is the standard parsed input terms for search. Downcase terms and boolean operators should be all caps # ex: "pneumonia AND pneumothorax" # bad example: "Pneumonia and pneumothorax" search_terms = "pneumonia" # Swap out with reportImpression as desired begin phrase_query = Java::harbinger.sdk.Search::search_query("reportBody", search_terms) rescue Java::OrgApacheLuceneQueryParser::ParseException => e return "Error while parsing search string" end #Arbitrary times but formatted as needed for the query start_date = 1.year.ago.strftime("%Y%m%d") stop_date = Time.now.strftime("%Y%m%d") #Time range filter on reportEvent timefilter1 = Java::harbinger.sdk.Search::term_range_filter("reportEvent", start_date, stop_date) #Time range filter on endExam timefilter2 = Java::harbinger.sdk.Search::term_range_filter("endExam", start_date, stop_date) #Procedure description filter for exact results pquery = Java::HarbingerSdkData::Procedure.createQuery(@entity_manager) descriptions = pquery.where(pquery.in(".description",["CT CHEST W CONT PULMONARY ARTERIES"])).select(".id").list.to_a.collect(&:to_s) procdescfilter = Java::harbinger.sdk.Search::term_values_filter("radExam.procedure.id", descriptions) #Build the full text query with the given search query and filters ftquery = full_text_entity_manager.createFullTextQuery(phrase_query) ftquery.setProjection(Java::org.hibernate.search.jpa.FullTextQuery::THIS, Java::org.hibernate.search.jpa.FullTextQuery::SCORE) ftquery.setFilter(Java::harbinger.sdk.Search::and_filters([timefilter1,timefilter2,procdescfilter])) ftquery.setMaxResults(100).getResultList().to_a
There is no facility to combine a free text search and a SQL
where clause into a single query at this time.