Querying across free text fields such as report bodies using relational methods (e.g. like) is computationally intensive and performance is poor in larger data sets (> 100,000 records) performance is very poor. Bridge provides a way to search using a free text index built on Apache Lucene and integrated into the SDK.

Basic Use

Building a search of the free text index requires a FullTextEntityManager and query parser. Getting a FullTextEntityManager is as simple as running Java::HarbingerSdk::DataUtils.getFullTextEntityManager. To create a search query parser use the Java::HarbingerSdk::Search::search_query method. Pass in the name of the index to search (ex: reportBody) and the search string. This query parser gives common search operators such as boolean syntax.

# A method to search with a given string input and a default limit of 10
def search(string,limit=10)
  fm = Java::HarbingerSdk::DataUtils.getFullTextEntityManager

  begin
    phrase_query = Java::harbinger.sdk.Search::search_query("reportBody", string)
  rescue Java::OrgApacheLuceneQueryParser::ParseException => e
    return "Error while parsing search string"
  end

  ftquery = fm.createFullTextQuery(phrase_query)
  # uncomment if you would like the lucene score back with the RadReport object
  ftquery.setProjection(Java::org.hibernate.search.jpa.FullTextQuery::THIS, Java::org.hibernate.search.jpa.FullTextQuery::SCORE)
  reports = ftquery.setMaxResults(limit).getResultList()
  fm.close()
  return reports
end

# Usage: search("pnemonia")
#        search("pneumonia AND pneumothorax")

Best Practice - When searching from user input, ensure an application can handle failed parsing. The exception that will be thrown for a parser error is Java::OrgApacheLuceneQueryParser::ParseException.

The return of the method above will yield an ArrayList of RadReport objects. Use the same methods/accessors on these objects as a relational database.

Search syntax

A search query is broken up into terms and phrases. A term is a single word such as "lung" or "nodule". A phrase is a group of words surrounded by double quotes such as "lung nodule". Multiple terms can be combined together with operators to form a more complex query.

Tip - The search method created in the example above assumes the "search" parameter that is passed in is already formatted in the form required by the search query. You may wish to simplify or hide the required syntax in the application user interface to make it more accessible.

Operators

Terms can be combined through logic operators. The search query supports AND, +, OR, NOT and - as operators (Note: operators must be ALL CAPS).

OR Operator

The OR operator is the default conjunction operator. This means that if there is no operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

To search for documents that contain either "lung nodule" or just "lung" use the query "lung nodule" lung or "lung nodule" OR lung.

AND Operator

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND. To search for documents that contain "lung nodule" and "chest" use the query "lung nodule" AND "chest".

+ (plus) Operator

The + or required operator ensures that the term after the + symbol exist somewhere in the document. To search for documents that must contain "lung" and may contain "chest" use the query +lung chest.

NOT Operator

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT. To search for documents that contain "lung nodule" but not "chest" use the query "lung nodule" NOT "chest". Note: The NOT operator cannot be used with a single term. For example, the following search will return no results: NOT "lung nodule"

- (minus) Operator

The - or prohibit operator excludes documents that contain the term after the - symbol. To search for documents that contain "lung nodule" but not "chest" use the query lung nodule" -"chest"

Grouping

The search query supports using parentheses to group clauses into sub queries. This can be very useful to control the boolean logic of a query. To search for either "lung" or "nodule" and "chest" use the query (lung OR nodule) AND chest. This eliminates confusion and ensures that chest must exist and either the terms lung or nodule may exist.

Proximity Searches

The search query supports finding words within specified distance of each other. To perform a proximity search, use the tilde, ~, symbol at the end of a phrase. To search for a "nodule" and "lung" within 10 words of each other in a document use the search "lung nodule"~10

Wildcard Searches

The search query supports single and multiple character wildcard searches within single terms (not within phrase queries). To perform a single character wildcard search use the ? symbol. To perform a multiple character wildcard search use the * symbol. Note: You cannot use a * or ? symbol as the first character of a search.

Examples

Rank Boosting

The search query provides the relevance level of matching documents based on the terms found. Boosting allows you to control the relevance of a document by boosting its term. The higher the boost factor, the more relevant the term will be. To boost a term use the caret, ^, symbol with a boost factor (a number) at the end of the term you are searching.

For example, if you are searching for lung nodule and you want the term "lung" to be more relevant boost it using the ^ symbol along with the boost factor next to the term such as lung^4 nodule. This will make documents with the term "lung" appear more relevant. You can also boost Phrase Terms as in the example "lung nodule"^4 "chest".

By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).

Escaping special characters

It is possible to escape operators such as ( using a \, but punctuation is not captured within the index, though they remain part of the document body.

Further details about querying parsing are available in the lucene documentation.

Additional Filtering

In addition to the reportBody index, there are other fields that can further refine a search. While all fields are represented within Lucene as strings, they are formatted to take advantage of their values. Below is a list of all the fields, their coresponding schema locations, and a description which will include any special formatting considerations.

Index Name Schema Location Description
reportBody rad_reports.report_body
reportImpression rad_reports.report_impression
diagnosis rad_exam_details.diagnosis
gender patients.gender
radExam.resource.modality.id modalities.id
radExam.resource.modality.modality modalities.modality
radExam.patientMrn.mrn patient_mrns.mrn
radExam.patientMrn.id patient_mrns.id
endExam rad_exam_times.end_exam YYYYMMDD
rad1.id rad_reports.rad1_id
rad1.name employee.name_id
rad2.id rad_reports.rad1_id
rad2.name employee.name_id
rad3.id rad_reports.rad1_id
rad3.name employee.name_id
rad4.id rad_reports.rad1_id
rad4.name employee.name_id
reportEvent rad_reports.report_event YYYYMMDD
patientGender patients.gender
radExam.relativePatientAge The age of the patient at the time of the exam (end exam) in years
radExam.procedure.id rad_exams.procedure_id
radExam.procedure.code procedures.code
radExam.radExamDepartment.id rad_exams.rad_exam_department_id
radExam.radExamDepartment.description rad_exam_departments.description
radExam.resource.id rad_exams.resource_id
radExam.site.id rad_exams.site_id
radExam.site.site sites.site
radExam.siteClass.id rad_exams.site_class_id
radExam.siteClass.name site_classes.name
radExam.siteClass.patientType.id site_classes.patient_type_id
radExam.siteClass.patientType.patientType patient_types.patient_type
reportStatus.id rad_reports.report_status_id
reportStatus.universalEventType.id universal_event_types.id

Within the SDK there are static methods to help build these indexes into a filter for the search query. They are:

Each can be combined into a set of filters with these boolean operators:

These methods can be found in the API documentation under the Search class.

An example combining these filters into a search term on reportBody by the rad_exam_times.end_exam and rad_reports.report_event date for a specific procedure:

entity_manager ||= Java::HarbingerSdk::DataUtils.getEntityManager
full_text_entity_manager ||= Java::HarbingerSdk::DataUtils.getFullTextEntityManager(entity_manager)

# This is the standard parsed input terms for search. Downcase terms and boolean operators should be all caps
# ex: "pneumonia AND pneumothorax"
# bad example: "Pneumonia and pneumothorax"
search_terms = "pneumonia"

# Swap out with reportImpression as desired
begin
  phrase_query = Java::harbinger.sdk.Search::search_query("reportBody", search_terms)
rescue Java::OrgApacheLuceneQueryParser::ParseException => e
  return "Error while parsing search string"
end

#Arbitrary times but formatted as needed for the query
start_date = 1.year.ago.strftime("%Y%m%d")
stop_date = Time.now.strftime("%Y%m%d")

#Time range filter on reportEvent
timefilter1 = Java::harbinger.sdk.Search::term_range_filter("reportEvent", start_date, stop_date)
#Time range filter on endExam
timefilter2 = Java::harbinger.sdk.Search::term_range_filter("endExam", start_date, stop_date)

#Procedure description filter for exact results
pquery = Java::HarbingerSdkData::Procedure.createQuery(@entity_manager)
descriptions = pquery.where(pquery.in(".description",["CT CHEST W CONT PULMONARY ARTERIES"])).select(".id").list.to_a.collect(&:to_s)
procdescfilter = Java::harbinger.sdk.Search::term_values_filter("radExam.procedure.id", descriptions)

#Build the full text query with the given search query and filters
ftquery = full_text_entity_manager.createFullTextQuery(phrase_query)
ftquery.setProjection(Java::org.hibernate.search.jpa.FullTextQuery::THIS, Java::org.hibernate.search.jpa.FullTextQuery::SCORE)
ftquery.setFilter(Java::harbinger.sdk.Search::and_filters([timefilter1,timefilter2,procdescfilter]))
ftquery.setMaxResults(100).getResultList().to_a

Limitations

There is no facility to combine a free text search and a SQL where clause into a single query at this time.