The SDK provides a query mechanism and object-relational mapping layer to provide robust access to the platform's clinical data.

Entity Managers

The entity manager is a channel to the database which additionally provides a simple caching layer. An entity manager is required to perform any query. Create and close an entity manager with each request that will perform a query against the database. The simplest way to do this in a Rails application is by building two filters and creating an instance variable called @entity_manager (referred to as @entity_manager throughout this documentation).

An example set of filters to build and close the entity manager in a Rails application controller:

class ApplicationController < ActionController::Base
  # ... other typical stuff
  before_filter :get_entity_manager
  after_filter :close_entity_manager

  def get_entity_manager
    @entity_manager ||= Java::HarbingerSdk::DataUtils.getEntityManager
  end

  def close_entity_manager
    @entity_manager.close if [email protected]_manager.nil?
  end

  # rest of application controller
end

Not every request requires an entity manager, so this technique is not ideal for all applications, but it serves the purpose of this tutorial and works for many common types of applications.

Querying

Queries should be done through the harbinger.sdk.Query library. The query library is a wrapper around the JPA query standard. It provides encapsulation for multiple variables that would need to be tracked independently across multiple locations to perform a typical query in JPA. It also provides a means to reference data in other related models without the burden of manually performing joins and tracking aliases.

To build a query object:

# Building the query object directly
query = Java::HarbingerSdk::Query.new(Java::HarbingerSdkData::RadExam.java_class,@entity_manager)
# Using the method on the desired class
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)

Using the query object to fetch a set of results using the list method:

results = query.limit(10).list()
results = results.to_a # Makes the returned object behave like a Ruby list

Excluding list, there are 4 types of methods on the query object: filter builders, select builders, getters, and setters.

Filtering (SQL WHERE CLAUSE)

The where clause of a select SQL statement can be very complex, and adding joins only increases the verbosity. Filters are broken down into expressions in the query library. Expressions are built and combined using various methods attached to the query object, which include mathematical operators (equal, greaterThan, lessThan, etc.), string matching (like, ilike), and boolean operators (and, or, not).

Example filter expressions:

# SQL equivalent "accession = '12345'"
acc_equals = query.equal(".accession","12345")

# SQL equivalent "from rad_exams r left outer join patient_mrns pm ON pm.id = r.patient_mrn_id"
mrn_like = query.ilike(".patientMrn.mrn","%987%") 

# Passing a list of expressions (nulls removed automatically) to the
# where method are combined with a SQL "AND"
query.where([acc_equals,patient_mrns]).limit(10).list()

These filters are unlike conventional JPA because instead of finding the property object to represent a field in the database, the query library accepts a string with special syntax to reference the property. Furthermore, there was an implicit join from rad_exams to patient_mrns through the patient_mrn_id foreign key in the rad_exams table.

The special syntax states that dot notation in a string is a property. The details of this are explored further in the Properties and Explicit Joins section below.

There is more syntactic sugar that can be used with the filter builders: whenever you have many filters that share the same operator and need AND between each of them, you can pass in a hash instead of building each filter individually.

filters = query.equal({".site.site" => "VHS", ".accession" => "12345"})
query.where(filters).limit(10).list()

The syntactic sugar can be stretched further: if the only filters will be an equal comparison with 'AND' between them, simply pass a hash directly to the where method.

query.where({".site.site" => "VHS", ".accession" => "12345"}).limit(10).list()

Almost every filter takes either 2 arguments (the left and right side of the operator) or a hash that represents a set of 2 arguments. Further details about the available filter methods are in the SDK documentation.

Properties and Explicit joins (SQL FROM clause)

The SDK special syntax above is a powerful tool to query the database because the data model is normalized. The rules are simple and will work for most query use cases, however, there is a way to circumvent them if necessary (further details below in the Querying section). The rules are:

  1. To be a property it must be a string
  2. To be a property it must have only characters (A-z) and include at least one "."
  3. If it is a property and the string starts with "." then it's implied anything following that "." is part of the base class of the query (ex: exam_query.property(".id") assumes seeking the id field of the RadExam class).
  4. If it is a property and the string does not start with "." then the first alpha string is the name of the class being references as the base class.

Examples of valid property notation:

query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.property(".id") # the id field of the rad exam
query.property(".site.site") # the site field of the site record associated with the rad exam
query.property("site.updatedAt") # the updatedAt field of the site

Whenever a property is more than a simple field of the base class, it will be an implicit left outer join. Once that join is performed, you can reference the joined class at the start of the property string as in the above example with site. If the example had referenced site.updatedAt before .site an exception would have been thrown. There are two other cases that can raise an exception:

  1. Referencing something that doesn't exist (e.g.: .fakeField).
  2. Referencing something with an ambiguous path (e.g.: externalSystem.externalSystem after joining multiple classes with containing an externalSystem relationship in the query).

The externalSystem use case is an example of the need to declare explicit joins. Other frequent use cases for explicit joins are to either ensure the order of the joins for performance reasons (join order can have substantial performance impact on a query) or to use a join type other than LEFT OUTER.

Examples of using the join method for explicit joins:

query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.join(".siteClass") # joins the site_classes table with a left outer join
query.join(".externalSystem","left") # joins the external_systems table with a left outer join
query.join(".site","inner") # joins the sites table via an inner join
# joins the currentStatuses table with a :right join. This is not going to be 
# useful if you are still selecting the base class
query.join(".currentStatus","right") 

These queries can be combined into one statement using a list of lists:

query.join([".siteClass",[".externalSystem","left"],[".site","inner"],[".currentStatus","right"]])

Classes that have already been joined directly can be referenced in subsequent joins. Joins can also be chained like a property if they share the same join type:

query.join(".siteClass") # initial join
query.join("siteClass.patientType") # referencing the siteClass directly now that it's joined
query.join(".siteClass.patientType") # joins siteClass and patientType with a left outer join

Selecting and Aggregates (SQL SELECT and GROUP BY clause)

Creating a query object also creates a default select clause. That clause is the class of the query.

query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.limit(10).list() # returns a list of RadExam Objects
query.first() # returns a RadExam Object

select is the method for overriding the default selection. It takes an expression, property, or root class or a list of any of those. All of these are valid calls to select:

query.select(query.property(".id")) # query.first() will now return an integer
query.select(query.count(".id")) # query.first() will now return the count of rad exam records
query.select(query.root("RadExam")) # query.first() will return a RadExam object
query.select([query.root("RadExam"),".site.site"]) # select both the RadExam object and the site field
query.select([query.count(".id"),".site.site",".currentStatusId"]) # select an aggregate and two properties

When you pass a list to the select method the results coming back are also lists. So for instance if I passed [".id",".siteId"] then I'd get a list of results that would look like this: [[1,1],[2,3],[3,1]...] Where each item in the result list is a list with the first item of that list being id and the second being siteId.

Remember, when using aggregate functions in conjunction with traditional fields you'll need to have those fields in the group by clause. You'd use the group by clause just as you'd expect. It takes either an expression, property, or property string or a list of those. Here is a complete aggregate query as an example:

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
total = query.count(".id")
query.select([total,".currentStatus.universalEventTypeId"]).group(".currentStatus.universalEventTypeId").order(query.desc(total))
totals = query.list()

Ordering (SQL ORDER BY clause)

The order by clause can take 3 types of arguments. The first and most common is a property string with or without a direction:

query.order(".updatedAt") # Orders by the updated_at field in ascending order (assumes asc)
query.order(".updatedAt asc")  # Orders by the updated_at field in ascending order explicitly
query.order(".updatedAt desc") # Orders by the updated_at field in descending order
query.order(".site.site desc") # Complex properties also work and the join will be done automatically

Lists of ordering property strings can also be passed:

query.order([".site.site",".updatedAt desc"])

The third argument possibility is directly passing a JPA Order expression (javax.persistence.criteria.Order) into the order method.

Warning be sure to explicitly join anything in the order clause if it's not referenced in the where clause. Not doing so can give odd results.

Using JPA directly through the Query library

The query library was designed to make common use cases for data access fast and powerful, however it does not cover all scenarios. For certain purposes it makes more sense to bypass the query library and work directly with the underlying JPA objects.

The two primary JPA objects needed are the criteria and the builder.

query.builder # for the builder object
query.criteria # for the criteria object

Those objects can be used by referencing the JPA criteria docs to access special functions like builder.function. The property method can still be used to get fields and pass those into builder functions (e.g.: query.builder.equal(query.property(".accession"),"982374")).

Debugging and Common Mistakes

There are a few tips and common mistakes to be aware of when using the query library. Please read through this section entirely as it will almost certainly save you pain in the future.

Using toSQL to debug

It is not uncommon during development for queries to succeed, but not return the expected results and this can be difficult to debug. One tool to assist in understand the query generated is to use the query.toSQL method to examine the SQL emitted by the library. This will return a string (meant for printing) of the sql being generated by the query library. Although a little difficult to read due to the auto-generated aliases, it can often reveal obvious mistakes in logic or erroneous parameters.

Forgetting the dot

Another common mistake is omitting a leading "." in a method expecting a property which can't handle a string:

query.order("updatedAt desc") # This will throw an exception
query.order(".updatedAt desc") # This is correct

Don't re-use query objects

Query objects are mutable because the underlying JPA criteria object is mutable and that object is used to store information (methods are run on it). This makes it easy to get unexpected results when attempting to re-use a query object:

query = HarbingerTransforms::Models.from_table(model).createQuery(entity_manager)
filters = query.equal(".site.site", "VHS")
count_filters = query.equal(".currentStatus.universalEventType","final")
results = query.where(filters).list()
count = query.where(count_filters).select(query.count(".id")).first()

Frequently this will yield SQL grammar errors associated with generated aliases that don't exist. To prevent these types of errors, each query should be it's own object.

Large queries and heap space

Sometimes you will create queries that will return a large number of results. A query that returns thousands of results at once is likely to eat a large amount of memory and even crash the web server. If you see an error with ran out of heap space as a result of a query then this is likely the result. Sometimes you'll be able to load your query, but you run out of memory while looping through the results. This is likely because within the loop you are making calls on each object that load more information into memory. An example of this would be querying for exams and then running exam.procedure inside your loop. This would then load the procedure object and store it in the exam object. The more calls to such things in your loop the more memory you have to take up for the duration of your loop and beyond if you've stored the results of your query in a variable.

If you intend to display this list to the user then typical pagination using limit and offset will resolve this issue. You'd simply not show so many results to the user at a given time. But what if you have a need to iterate over the entire list? Then you need to break the query down into chunks but still process each item. To do this you could define a function like this one:

def query_each(query,batch_size=25,offset=0,&block)
  result_set = query.offset(offset).limit(batch_size).list().to_a
  if result_set.size > 0
    result_set.each(&block)
    batch_query(query,batch_size,offset+batch_size,&block)
  else
   nil
  end
end

This will run your query in batches made by offset and limit and so long as there are results it will run the given code block on each result. Here is how you'd use it:

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
# This query would give me all the radiology exams in the database and would surely
# die if I just ran list()
query_each(query) do |exam|
  puts exam.accession, exam.procedure.code
end

Query Examples

Combining all of the knowledge described above, this section provides several query examples that you can use to guide your initial querying. All of these queries assume you've created an entity manager that you've stored in the entity_manager variable.

Querying with time

A simple query for finding a field that is between two given timestamps

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.between(".radExamTime.endExam", 2.days.ago.to_time, 1.day.ago.to_time)).list()

Selecting the year, month, and day as separate field for an exam with the id of 12345.

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where({".id" => 12345}).select([query.year(".radExamTime.endExam"),
                                      query.month(".radExamTime.endExam"),
                                      query.day(".radExamTime.endExam"))
exam = query.first()

Selecting all the exams that either began today or have an appointment time of today.

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.equal(query.date(".radExamTime.beginExam"),Date.today.to_time),
                    query.equal(query.date(".radExamTime.appointment"),Date.today.to_time)]))
exams = query.list()

or what is likely much faster

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.between(".radExamTime.beginExam",Date.today.to_time,Date.tomorrow.to_time),
                     query.between(".radExamTime.appointment",Date.today.to_time,Date.tomorrow.to_time)]))
exams = query.list()

Querying with likes and regex

Here is how to query for a particular procedure based on a simple insensitive like or a pattern (regex).

query = Java::HarbingerSdkData::Procedure.createQuery(entity_manager)
# starts with CT, ends with MOD1, must have at least one character in between
query.where(query.or([query.ilike(".code","MR%"),
                     query.regex(".code","^CT.+MOD1$")]))
procedures = query.list()

Many equals and a like

Assuming everything is going to be combined with an AND then here is a way to take a lot of equals and a like or two and add them as the where clause. This would be common for something like a search in a ruby controller.

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
equals = query.equals({".site.site" => params[:site],
                       ".procedure.code" => params[:procedure_code],
                       ".patientMrn.mrn" => params[:mrn],
                       ".siteClass.patientType.patientType" => params[:patient_type],
                       ".externalSystemId" => params[:external_system_id].to_i
                      }.delete_if {|k,v| v.blank? or v == 0})
likes = query.ilike({".patientMrn.patient.name" => params[:patient_name],
                     ".accession" => params[:accession]
                    }.delete_if {|k,v| v.blank? })
exams = query.where([equals,likes]).limit(10).list()

Cancelled Exams

Sometimes you want all exams that are not cancelled. Just saying that the universal event type isn't cancelled will exclude any exam with a status that hasn't been mapped to a trip status. So you'd probably want to do something more like this:

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.notEqual(".currentStatus.universalEventType.eventType","cancelled"),
           query.isNull(".currentStatus.universalEventTypeId")]))
exams = query.limit(10).list()

Aggregate: Count by universal_event_type

Here is a query that will count all the radiology exams in the system and group them by universal event type

query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
total = query.count(".id")
query.select([total,".currentStatus.universalEventTypeId"]).group(".currentStatus.universalEventTypeId").order(query.desc(total))
totals = query.list()

Interacting with ORM objects

If the select clause is not explicitly set, the result of a query is a list of objects. Those objects represent a row in a given table (represented by the class) in the database. Those objects provide access to its fields and relationships through dot notation.

query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
exam = query.first()
exam.getId # returns the the value in the id field
exam.accession # returns the value in the accession field
exam.site # returns the corresponding site object linked to this exam
exam.site.site # returns the site field of the site object
# return the value of the patient_type field in the patient_types table
# linked to this exam through the site_classes table
exam.siteClass.patientType.patientType

This makes traversing the highly normalized structure much easier, however it is extremely important to note that many foreign keys can be null, so it is easily possible to attempt to call an additional method on a null object. To avoid this error, perform a check in advance:

# currentStatus (a required field for a radExam) and universalEventType (not required for externalSystemStatuses).
exam.currentStatus.universalEventType.eventType if exam.currentStatus.universalEventTypeId

When referencing a relationship of an object, the object will query the database to get information about the record that relationship references. Running exam.siteClass performs a query to site_classes using the primary key (very fast). The exam object will cache the result of that query, so there is no need to set the return of exam.siteClass to a variable.

exam.siteClass # A query is run to get the siteClass record
exam.siteClass # A query is NOT run
exam.siteClass.trauma # A query is NOT run since it's a field and that information has already been retrieved
exam.siteClass.patientType # A query is run to get the patientType record

Custom methods have been added to the ORM objects to facilitate returning information more easily, but these methods cannot be referenced/used in a query, only on the ORM object. An example of this is employee.primarySpecialty. This method performs a query to get the employeeSpecialtyMapping that is flagged as primary (a boolean column on the mapping table) associated with the employee. It returns the specialty associated with that employeeSpecialtyMapping. This method simplifies a common use case for a complex relationship model.

It is important to note that while interacting with objects (as opposed to a more primitive data type) is great from a programmers perspective, it has an overhead price in terms of performance and memory usage. It is easy to accidentally instantiate thousands of objects if you run list() without a limit set. The limit and offset functions are important safeguards to prevent these types of runaway behavior. It is strongly recommend to use them frequently, particularly because data across facilities often scales differently and test and development systems frequently contain less data than production systems. Apps ignoring these safeguards can create a hazardous environment on the applications server and are unlikely to pass an application review before deploying an app into production.

Shortcut Methods

While the Query library allows you to build complex queries there are several common query types that can be done without instantiating a new query object. Much of the time you will simply want to get the row associated with a primary key on a given model. The SDK provides a special method for doing this:

# This will give you the exam record with the id of 1
# while the second argument is optional  it is recommended you pass in an entity manager
exam = Java::HarbingerSdkData::RadExam.withId(1,@entity_manager)
# Now you can access the data from that record as described in the "Interacting with ORM Objects"
accession = exam.accession
begin_exam = exam.radExamTime.beginExam

The shortcut methods are described below. Each takes an optional (but recommended) last argument of an entity manager.