The SDK provides a query mechanism and object-relational mapping layer to provide robust access to the platform's clinical data.
Entity Managers¶
The entity manager is a channel to the database which additionally provides a simple caching layer. An entity manager is required to perform any query. Create and close an entity manager with each request that will perform a query against the database. The simplest way to do this in a Rails application is by building two filters and creating an instance variable called @entity_manager
(referred to as @entity_manager
throughout this documentation).
An example set of filters to build and close the entity manager in a Rails application controller:
class ApplicationController < ActionController::Base
# ... other typical stuff
before_filter :get_entity_manager
after_filter :close_entity_manager
def get_entity_manager
@entity_manager ||= Java::HarbingerSdk::DataUtils.getEntityManager
end
def close_entity_manager
@entity_manager.close if [email protected]_manager.nil?
end
# rest of application controller
end
Not every request requires an entity manager, so this technique is not ideal for all applications, but it serves the purpose of this tutorial and works for many common types of applications.
Querying¶
Queries should be done through the harbinger.sdk.Query
library. The query library is a wrapper around the JPA query standard. It provides encapsulation for multiple variables that would need to be tracked independently across multiple locations to perform a typical query in JPA. It also provides a means to reference data in other related models without the burden of manually performing joins and tracking aliases.
To build a query object:
# Building the query object directly
query = Java::HarbingerSdk::Query.new(Java::HarbingerSdkData::RadExam.java_class,@entity_manager)
# Using the method on the desired class
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
Using the query object to fetch a set of results using the list method:
results = query.limit(10).list()
results = results.to_a # Makes the returned object behave like a Ruby list
Excluding list
, there are 4 types of methods on the query object: filter builders, select builders, getters, and setters.
Filtering (SQL WHERE CLAUSE)¶
The where
clause of a select
SQL statement can be very complex, and adding joins only increases the verbosity. Filters are broken down into expressions in the query library. Expressions are built and combined using various methods attached to the query object, which include mathematical operators (equal
, greaterThan
, lessThan
, etc.), string matching (like
, ilike
), and boolean operators (and
, or
, not
).
Example filter expressions:
# SQL equivalent "accession = '12345'"
acc_equals = query.equal(".accession","12345")
# SQL equivalent "from rad_exams r left outer join patient_mrns pm ON pm.id = r.patient_mrn_id"
mrn_like = query.ilike(".patientMrn.mrn","%987%")
# Passing a list of expressions (nulls removed automatically) to the
# where method are combined with a SQL "AND"
query.where([acc_equals,patient_mrns]).limit(10).list()
These filters are unlike conventional JPA because instead of finding the property object to represent a field in the database, the query library accepts a string with special syntax to reference the property. Furthermore, there was an implicit join from rad_exams
to patient_mrns
through the patient_mrn_id
foreign key in the rad_exams
table.
The special syntax states that dot notation in a string is a property. The details of this are explored further in the Properties and Explicit Joins
section below.
There is more syntactic sugar that can be used with the filter builders: whenever you have many filters that share the same operator and need AND
between each of them, you can pass in a hash instead of building each filter individually.
filters = query.equal({".site.site" => "VHS", ".accession" => "12345"})
query.where(filters).limit(10).list()
The syntactic sugar can be stretched further: if the only filters will be an equal comparison with 'AND' between them, simply pass a hash directly to the where
method.
query.where({".site.site" => "VHS", ".accession" => "12345"}).limit(10).list()
Almost every filter takes either 2 arguments (the left and right side of the operator) or a hash that represents a set of 2 arguments. Further details about the available filter methods are in the SDK documentation.
Properties and Explicit joins (SQL FROM clause)¶
The SDK special syntax above is a powerful tool to query the database because the data model is normalized. The rules are simple and will work for most query use cases, however, there is a way to circumvent them if necessary (further details below in the Querying
section). The rules are:
- To be a property it must be a string
- To be a property it must have only characters (A-z) and include at least one "."
- If it is a property and the string starts with "." then it's implied anything following that "." is part of the base class of the query (ex:
exam_query.property(".id")
assumes seeking theid
field of theRadExam
class). - If it is a property and the string does not start with "." then the first alpha string is the name of the class being references as the base class.
Examples of valid property notation:
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.property(".id") # the id field of the rad exam
query.property(".site.site") # the site field of the site record associated with the rad exam
query.property("site.updatedAt") # the updatedAt field of the site
Whenever a property is more than a simple field of the base class, it will be an implicit left outer join. Once that join is performed, you can reference the joined class at the start of the property string as in the above example with site
. If the example had referenced site.updatedAt
before .site
an exception would have been thrown. There are two other cases that can raise an exception:
- Referencing something that doesn't exist (e.g.:
.fakeField
). - Referencing something with an ambiguous path (e.g.:
externalSystem.externalSystem
after joining multiple classes with containing anexternalSystem
relationship in the query).
The externalSystem
use case is an example of the need to declare explicit joins. Other frequent use cases for explicit joins are to either ensure the order of the joins for performance reasons (join order can have substantial performance impact on a query) or to use a join type other than LEFT OUTER
.
Examples of using the join
method for explicit joins:
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.join(".siteClass") # joins the site_classes table with a left outer join
query.join(".externalSystem","left") # joins the external_systems table with a left outer join
query.join(".site","inner") # joins the sites table via an inner join
# joins the currentStatuses table with a :right join. This is not going to be
# useful if you are still selecting the base class
query.join(".currentStatus","right")
These queries can be combined into one statement using a list of lists:
query.join([".siteClass",[".externalSystem","left"],[".site","inner"],[".currentStatus","right"]])
Classes that have already been joined directly can be referenced in subsequent joins. Joins can also be chained like a property if they share the same join type:
query.join(".siteClass") # initial join
query.join("siteClass.patientType") # referencing the siteClass directly now that it's joined
query.join(".siteClass.patientType") # joins siteClass and patientType with a left outer join
Selecting and Aggregates (SQL SELECT and GROUP BY clause)¶
Creating a query object also creates a default select clause. That clause is the class of the query.
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
query.limit(10).list() # returns a list of RadExam Objects
query.first() # returns a RadExam Object
select
is the method for overriding the default selection. It takes an expression, property, or root class or a list of any of those. All of these are valid calls to select:
query.select(query.property(".id")) # query.first() will now return an integer
query.select(query.count(".id")) # query.first() will now return the count of rad exam records
query.select(query.root("RadExam")) # query.first() will return a RadExam object
query.select([query.root("RadExam"),".site.site"]) # select both the RadExam object and the site field
query.select([query.count(".id"),".site.site",".currentStatusId"]) # select an aggregate and two properties
When you pass a list to the select method the results coming back are also lists. So for instance if I passed [".id",".siteId"]
then I'd get a list of results that would look like this: [[1,1],[2,3],[3,1]...]
Where each item in the result list is a list with the first item of that list being id
and the second being siteId
.
Remember, when using aggregate functions in conjunction with traditional fields you'll need to have those fields in the group by clause. You'd use the group by clause just as you'd expect. It takes either an expression, property, or property string or a list of those. Here is a complete aggregate query as an example:
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
total = query.count(".id")
query.select([total,".currentStatus.universalEventTypeId"]).group(".currentStatus.universalEventTypeId").order(query.desc(total))
totals = query.list()
Ordering (SQL ORDER BY clause)¶
The order by clause can take 3 types of arguments. The first and most common is a property string with or without a direction:
query.order(".updatedAt") # Orders by the updated_at field in ascending order (assumes asc)
query.order(".updatedAt asc") # Orders by the updated_at field in ascending order explicitly
query.order(".updatedAt desc") # Orders by the updated_at field in descending order
query.order(".site.site desc") # Complex properties also work and the join will be done automatically
Lists of ordering property strings can also be passed:
query.order([".site.site",".updatedAt desc"])
The third argument possibility is directly passing a JPA Order expression (javax.persistence.criteria.Order
) into the order
method.
Warning be sure to explicitly join anything in the order clause if it's not referenced in the where clause. Not doing so can give odd results.
Using JPA directly through the Query library¶
The query library was designed to make common use cases for data access fast and powerful, however it does not cover all scenarios. For certain purposes it makes more sense to bypass the query library and work directly with the underlying JPA objects.
The two primary JPA objects needed are the criteria and the builder.
query.builder # for the builder object
query.criteria # for the criteria object
Those objects can be used by referencing the JPA criteria docs to access special functions like builder.function
. The property
method can still be used to get fields and pass those into builder functions (e.g.: query.builder.equal(query.property(".accession"),"982374")
).
Debugging and Common Mistakes¶
There are a few tips and common mistakes to be aware of when using the query library. Please read through this section entirely as it will almost certainly save you pain in the future.
Using toSQL to debug¶
It is not uncommon during development for queries to succeed, but not return the expected results and this can be difficult to debug. One tool to assist in understand the query generated is to use the query.toSQL
method to examine the SQL emitted by the library. This will return a string (meant for printing) of the sql being generated by the query library. Although a little difficult to read due to the auto-generated aliases, it can often reveal obvious mistakes in logic or erroneous parameters.
Forgetting the dot¶
Another common mistake is omitting a leading "." in a method expecting a property which can't handle a string:
query.order("updatedAt desc") # This will throw an exception
query.order(".updatedAt desc") # This is correct
Don't re-use query objects¶
Query objects are mutable because the underlying JPA criteria object is mutable and that object is used to store information (methods are run on it). This makes it easy to get unexpected results when attempting to re-use a query object:
query = HarbingerTransforms::Models.from_table(model).createQuery(entity_manager)
filters = query.equal(".site.site", "VHS")
count_filters = query.equal(".currentStatus.universalEventType","final")
results = query.where(filters).list()
count = query.where(count_filters).select(query.count(".id")).first()
Frequently this will yield SQL grammar errors associated with generated aliases that don't exist. To prevent these types of errors, each query should be it's own object.
Large queries and heap space¶
Sometimes you will create queries that will return a large number of results. A query that returns thousands of results at once is likely to eat a large amount of memory and even crash the web server. If you see an error with ran out of heap space
as a result of a query then this is likely the result. Sometimes you'll be able to load your query, but you run out of memory while looping through the results. This is likely because within the loop you are making calls on each object that load more information into memory. An example of this would be querying for exams and then running exam.procedure
inside your loop. This would then load the procedure object and store it in the exam object. The more calls to such things in your loop the more memory you have to take up for the duration of your loop and beyond if you've stored the results of your query in a variable.
If you intend to display this list to the user then typical pagination using limit and offset will resolve this issue. You'd simply not show so many results to the user at a given time. But what if you have a need to iterate over the entire list? Then you need to break the query down into chunks but still process each item. To do this you could define a function like this one:
def query_each(query,batch_size=25,offset=0,&block)
result_set = query.offset(offset).limit(batch_size).list().to_a
if result_set.size > 0
result_set.each(&block)
batch_query(query,batch_size,offset+batch_size,&block)
else
nil
end
end
This will run your query in batches made by offset and limit and so long as there are results it will run the given code block on each result. Here is how you'd use it:
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
# This query would give me all the radiology exams in the database and would surely
# die if I just ran list()
query_each(query) do |exam|
puts exam.accession, exam.procedure.code
end
Query Examples¶
Combining all of the knowledge described above, this section provides several query examples that you can use to guide your initial querying. All of these queries assume you've created an entity manager that you've stored in the entity_manager
variable.
Querying with time¶
A simple query for finding a field that is between two given timestamps
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.between(".radExamTime.endExam", 2.days.ago.to_time, 1.day.ago.to_time)).list()
Selecting the year, month, and day as separate field for an exam with the id of 12345.
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where({".id" => 12345}).select([query.year(".radExamTime.endExam"),
query.month(".radExamTime.endExam"),
query.day(".radExamTime.endExam"))
exam = query.first()
Selecting all the exams that either began today or have an appointment time of today.
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.equal(query.date(".radExamTime.beginExam"),Date.today.to_time),
query.equal(query.date(".radExamTime.appointment"),Date.today.to_time)]))
exams = query.list()
or what is likely much faster
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.between(".radExamTime.beginExam",Date.today.to_time,Date.tomorrow.to_time),
query.between(".radExamTime.appointment",Date.today.to_time,Date.tomorrow.to_time)]))
exams = query.list()
Querying with likes and regex¶
Here is how to query for a particular procedure based on a simple insensitive like or a pattern (regex).
query = Java::HarbingerSdkData::Procedure.createQuery(entity_manager)
# starts with CT, ends with MOD1, must have at least one character in between
query.where(query.or([query.ilike(".code","MR%"),
query.regex(".code","^CT.+MOD1$")]))
procedures = query.list()
Many equals and a like¶
Assuming everything is going to be combined with an AND then here is a way to take a lot of equals and a like or two and add them as the where clause. This would be common for something like a search in a ruby controller.
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
equals = query.equals({".site.site" => params[:site],
".procedure.code" => params[:procedure_code],
".patientMrn.mrn" => params[:mrn],
".siteClass.patientType.patientType" => params[:patient_type],
".externalSystemId" => params[:external_system_id].to_i
}.delete_if {|k,v| v.blank? or v == 0})
likes = query.ilike({".patientMrn.patient.name" => params[:patient_name],
".accession" => params[:accession]
}.delete_if {|k,v| v.blank? })
exams = query.where([equals,likes]).limit(10).list()
Cancelled Exams¶
Sometimes you want all exams that are not cancelled. Just saying that the universal event type isn't cancelled will exclude any exam with a status that hasn't been mapped to a trip status. So you'd probably want to do something more like this:
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
query.where(query.or([query.notEqual(".currentStatus.universalEventType.eventType","cancelled"),
query.isNull(".currentStatus.universalEventTypeId")]))
exams = query.limit(10).list()
Aggregate: Count by universal_event_type¶
Here is a query that will count all the radiology exams in the system and group them by universal event type
query = Java::HarbingerSdkData::RadExam.createQuery(entity_manager)
total = query.count(".id")
query.select([total,".currentStatus.universalEventTypeId"]).group(".currentStatus.universalEventTypeId").order(query.desc(total))
totals = query.list()
Interacting with ORM objects¶
If the select clause is not explicitly set, the result of a query is a list of objects. Those objects represent a row in a given table (represented by the class) in the database. Those objects provide access to its fields and relationships through dot notation.
query = Java::HarbingerSdkData::RadExam.createQuery(@entity_manager)
exam = query.first()
exam.getId # returns the the value in the id field
exam.accession # returns the value in the accession field
exam.site # returns the corresponding site object linked to this exam
exam.site.site # returns the site field of the site object
# return the value of the patient_type field in the patient_types table
# linked to this exam through the site_classes table
exam.siteClass.patientType.patientType
This makes traversing the highly normalized structure much easier, however it is extremely important to note that many foreign keys can be null, so it is easily possible to attempt to call an additional method on a null object. To avoid this error, perform a check in advance:
# currentStatus (a required field for a radExam) and universalEventType (not required for externalSystemStatuses).
exam.currentStatus.universalEventType.eventType if exam.currentStatus.universalEventTypeId
When referencing a relationship of an object, the object will query the database to get information about the record that relationship references. Running exam.siteClass
performs a query to site_classes
using the primary key (very fast). The exam object will cache the result of that query, so there is no need to set the return of exam.siteClass
to a variable.
exam.siteClass # A query is run to get the siteClass record
exam.siteClass # A query is NOT run
exam.siteClass.trauma # A query is NOT run since it's a field and that information has already been retrieved
exam.siteClass.patientType # A query is run to get the patientType record
Custom methods have been added to the ORM objects to facilitate returning information more easily, but these methods cannot be referenced/used in a query, only on the ORM object. An example of this is employee.primarySpecialty
. This method performs a query to get the employeeSpecialtyMapping
that is flagged as primary
(a boolean column on the mapping table) associated with the employee
. It returns the specialty
associated with that employeeSpecialtyMapping
. This method simplifies a common use case for a complex relationship model.
It is important to note that while interacting with objects (as opposed to a more primitive data type) is great from a programmers perspective, it has an overhead price in terms of performance and memory usage. It is easy to accidentally instantiate thousands of objects if you run list()
without a limit set. The limit
and offset
functions are important safeguards to prevent these types of runaway behavior. It is strongly recommend to use them frequently, particularly because data across facilities often scales differently and test and development systems frequently contain less data than production systems. Apps ignoring these safeguards can create a hazardous environment on the applications server and are unlikely to pass an application review before deploying an app into production.
Shortcut Methods¶
While the Query library allows you to build complex queries there are several common query types that can be done without instantiating a new query object. Much of the time you will simply want to get the row associated with a primary key on a given model. The SDK provides a special method for doing this:
# This will give you the exam record with the id of 1
# while the second argument is optional it is recommended you pass in an entity manager
exam = Java::HarbingerSdkData::RadExam.withId(1,@entity_manager)
# Now you can access the data from that record as described in the "Interacting with ORM Objects"
accession = exam.accession
begin_exam = exam.radExamTime.beginExam
The shortcut methods are described below. Each takes an optional (but recommended) last argument of an entity manager.
withId(id,[entity_manager])
- takes an integer and returns the a single record where the primary key matches the given id.allWithLimit(limit,[entity_manager])
- takes an integer limit returns a list of all the records in the table up to that limit (order is not reliable).rowsWith(hash,[entity_manager])
- takes a key value hash of property to value and will return a list of all records that match the given hash with a where clause consisting ofproperty = value
joined byAND
's.firstWith(hash,[entity_manager])
- takes a key value hash of property to value and will return a list of the first record (order not reliable) that matches the given hash with a where clause consisting ofproperty = value
joined byAND
's.