Pipeline

Overview

The pipeline module in NIDB performs analyses on the imaging study level. Data is specified and if an imaging study contains the data criteria, it is downloaded to the pipeline storage area and the analysis is performed. Pipelines can only be processed in a cluster environment such as the sun grid engine, and the NIDB server must have privileges to submit jobs to a cluster.

Analyses are performed and stored in a directory structure similar to that of /nidb/archive, and by default the pipeline storage directory is /nidb/pipeline. Because analyses grow so large, it is best to make the pipeline directory an NFS share on a larger storage server. The directory structure is in this format: /nidb/pipeline/{subjectid}/{studynum}/{pipelinename}.

 

Adding a new pipeline

Click the menu item Analysis → Pipelines. Click Add Pipeline. Fill out the pipeline details:

New Pipeline Parameters
Field Description Notes
Title Title No spaces or special characters. This will be the directory name under which the analysis is stored for each study. Once the pipeline is created the name cannot be changed.
Description Description Any useful information that identifies this pipeline from others
Concurrent processes Number of simultaneous jobs submitted to the cluster This number only applies to the specified pipeline. If you specify 10 then only 10 jobs will be allowed to run at any given time. If you have other pipelines they will have their own limits. The total jobs running is still governed by the cluster software
Data download criteria for which protocols are copied into the analysis directory or will copy data and run the pipeline if a study contains any of the protocol names specified in the data download section. and will copy data and run the pipeline only if a study contains ALL of the protocol names specified. none no data will be copied for the pipeline. This is only used for pipelines that are dependent on other pipelines. There's no point on running an analysis without any data!
Successful files Criteria for marking an analysis as successful If all of the files specified are found then the analysis is marked as successful. For example if your analysis produces a certain image file as the final result then you can enter that image filename (and relative path) and if that exists the analysis will show a green checkmark in the analysis summary
Pipeline dependency Pipeline on which this analysis depends An example of dependency is a DTI pipeline: to perform TBSS or bedpost or some other analysis you need to have the FA map first. You might want to run both bedpost and TBSS on a subject but then you would duplicate the steps required to create the FA map. Instead you can create the FA map in one pipeline and have a subsequent pipeline use the FA map to perform bedpost or TBSS.
Group(s) study level groups to perform the pipeline on If you only want to have the pipeline run on a pre-established group instead of all subjects that fit the data criteria then use this option.
Notes notes offers space for more detail than the description section

Click Add Pipeline Info and the pipeline will be saved. Go back into that pipeline and you’ll see the Pipeline specification section. Fill out the data specification and make sure to check off Enabled on the righthand side.

Then fill out the Commands section. You’ll see a syntax highlighted text box with line numbers. Type your bash commands into that box, using the pipeline variables on the righthand side where necessary. Click the pipeline variable to insert it into your script at the location of the cursor. The line-wrapping can also be turned on or off.

When done editing the script, click Update Pipeline Definition Only. This will only update the data and command sections, not the title/dependencies/etc at the top of the page.

Pipeline directory structure

The pipeline will be stored in the /nidb/pipeline/{subjectid}/{studynum}/{pipelinename} directory. That is considered to be the analysisroot directory. A good location to store raw data is analysisroot/data, and actually perform the analysis in analysisroot/analysis. A analysisroot/pipeline directory is automatically generated which contains log files and the SGE job file. This ensures that raw data, analysis, and pipeline processes are kept separate. However, any configuration will work.

Updating the pipeline

To change only the pipeline definition, edit the appropriate fields and click Update Pipeline Info. To change the data specification or command script, edit the appropriate fields and click Update Pipeline Definition Only. Each time either of those buttons are clicked, the pipeline version is incremented.

Testing the pipeline

Your script may work the first time you run it, but chances are it will take several iterations of your script to get it to work properly. To test your script, click the Testing checkbox for your pipeline. The next time your script is run, it will only run through 10 studies, stop running, and turn off testing. You can then check the logs to see if it processed correctly and if not, fix the problem and test again.

Checking pipeline results

On the pipeline page, click the icon in the Analyses column, and a list of all analyses will be displayed. Click the icon under the Logs column to view the processing logs.

Log files
File Notes
{pipelinename}.o{number} This is the captured standard output from the script that was run. It will contain any errors or anything not captured in the other log files
data.log a log of what data was downloaded
step{N}.log standard output captured for each line in the script. If {NOLOG} is in a line it will not be logged

Leave a Reply

Your email address will not be published. Required fields are marked *

*


5 + eight =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>