JASON includes a range of powerful tools for processing and analysing your NMR data, however, sometimes you might want to do something not yet available in JASON. In this series of blog posts, I will demonstrate how to use the “External Command” processing item. This allows you to pass NMR data to an external program, perform some operation(s) on the data, and then return the results in JASON.
Who is this for? Researchers who are developing new processing techniques, for example, new non-uniform sampling reconstruction methods, novel window functions, or signal processing techniques etc. will find this feature essential. In fact, the NMR data can be sent for external processing at any point in the processing chain, even multiple times, allowing you complete control of your data.
How does it work? JASON stores data using HDF5 format files. These are readable by a wide variety of tools with HDF5 bindings (libraries) available for all common programming languages, including Python, Matlab, R, Mathematica, meaning you can use your favourite language!
The external command processing item generates a slimmed-down HDF5 file which is passed as an argument to your chosen program. Once the external calculations are complete, JASON reads the output back in and continues with the next item in the processing list.
In this post, I will demonstrate how to use the external command processing item with a simple python script . This script will simply invert the spectrum, however much more sophisticated calculations are possible. The limit is only your imagination!
The python script we will discuss is shown below. Python can access HDF5 files using the h5py library, which returns the data as NumPy arrays, commonly used for scientific computing in python.
The script begins with three statements to import h5py and NumPy , along with sys from the python standard library. The HDF5 file sent from JASON is opened as file handle using the h5py.File() function. For security reasons, the NMR data is sent from JASON using a random filename, so we capture this as one of the input arguments to the python script via the sys.argv list.
Once the file is open, the NMR data within it can be accessed using python’s dictionary syntax. HDF5 files use a path-like hierarchy for storing data, these paths are used as the dictionary keys. For example, the real part of the NMR spectrum can be found at /JasonDocument/DataPoints/0, while /JasonDocument/DataPoints/1 corresponds to the imaginary component.
Our data is now a NumPy array and all the usual features of python can be used. In our example we are going to invert the data. To do this, we use NumPy’s negative() function which changes the sign of each element of an array. Since we have not changed the size of the data (the number of datapoints has not changed) we can write the data back to the appropriate path within the HDF5 file using python’s ellipsis operator.
To finish the script all we need to do is close the file handle, which will write the modified data back to the temporary file. JASON read’s back the temporary file and continues with the remaining items in the processing list. Once the processing is complete, the spectrum displayed on the canvas is updated.
The processing item has two parameters which need to be set, the first is the command, in this case python, and the second is the argument, in this case this is the full path to your python script. You then press the apply button at the bottom of the processing panel, and the script will execute at the appropriate point in the processing list.
The figure below shows before (left) and after (right) application of the invert.py script. The only difference between these spectra is the use of the invert.py script via the external command.
A second, slightly more complex example, is shown below. This double.py script appends an inverted copy of the spectrum to the right-hand end of the original data. In this case as we are changing the number of datapoints, we need to update some of the associated metadata parameters.
The script itself begins in the same way as the previous example:
We can use the NumPy append function to join the original data with the inverted copy. Any function available in NumPy or other libraries can be used, making the external command an extremely powerful tool. Since we have now changed the size of the data, there are more data points than when we started, we need to modify this attribute. The size of the data is one of the attributes which is attached to the HDF file and we can read this using the .attrs method from the h5py library. We then use the modify() method to update this so that when the data is returned to JASON everything works as we expect.
Once we have finished processing our data, we need to write this back. This time, since the modified arrays are a different size to those we read, we cannot modify them in place. This is a limitation derived from HDF5 itself. We therefore need to delete the appropriate entries in the HDF5 file using the del statement and then recreate them. Finally, we close the open file handle and the processed data is returned to JASON.
The figure below shows before (left) and after (right) application of the double.py script. The only difference in the processing is the use of the external command.
The external command processing item is an extremely powerful feature of JASON, it gives you complete flexibility over your data analysis pipeline!
The scripts used in this series of posts are available here and can be used as the basis for your own external commands. In the next part of this series, I will explore using other languages with the external command processing item.