Standalone core and parallelisation

As described in the model processing section, PSICS separates the core calculations from the rest of the tasks involved in running a model and presenting the results. The core is the only place where the efficiency of the system matters. Indeed, within the core, only a few subroutines do the vast majority of the work. For testing and verification, there is also a parallel implementation in java for all the calculations, but it is not used under normal circumstances. The main interest in having a standalone implementation of the core calculations is for on clusters or on multiprocessor machines. For use on a single machine the handover between the java program and the core calculations (and back) is transparent and does not involve any choices or intervention by the user. This section first explains the division of work between the main program and the core, then describes the input and output formats, and finally shows how it can be used in a parallel environment.

Core tasks

The output of the first model processing stage is a large set of tables of various sorts. These contanin a complete specification of the model as it is to be run, including the compartmentalization of the cell, the number of channels on each compartment, the transition rates of the channels evaluated over a range of membrane potentials, any applied stimulation, and the recording configuration. The tables are written to a plain text file with the name "xxx.ppp" (ppp for PSICS Pre-Processed) for an original model file "xxx.xml". This text file contains all that the core needs to perform its calculations.

The internal structure of the core is a little like that of the main program. It reads the model specification, reprocesses it a little into the best format for working with in the calculation, and then most of the work goes on in a couple of subroutines that do most of the work. The reprocesing steps that are left to the core are the ones that needed for the particular solution method. In particular, it constructs sorted cumulative transition matricies to minimise the number of steps in generating stochastic transitions, and it rearranges the connections table into the right order for the Hines Method. As the calculation proceeds, it writes writes the raw results to a binary file. Once the calculation is complete, it may converts this binary file into a plain text table depending on control parameters in the specification file. The text output is convenient for visualization, and avoids any problems of binary format compatibility between programs but it also takes a lot of space and is slow to generate. For intensive applications it may be more efficient to make the post-processing tooks understand the binary data rather than have PSICS generate palin text.

Input format

The box below shows the main elements of the xxx.ppp input file. This one corresponds to the standard Rallpack3 model except that most of the content of the channel and compartment tables have been cut out for conciceness. The comments at the end of lines are included in these files when the main program creates them but are purely for reading convenience. The core expects an exact sequence for the data and ignores the text.

These files are not intended to be generated by anything other than psics, but it can be convenient to change some of the quantities at the beginning or the end. In particular, the runtime and timestep appear on the first row, and the points to record from appear at the very end. (Firefox users can enlarge the text with CTRL + or with CTRL and the mouse wheel).

FPSICS-1.0
      250.000     0.100000     -65.0000     -1.00000        //runtime, timestep, v0, wf
 1                                                          //number of command runs
 5                                                          //n channel
HH_K
 0 0 94 1 1                                                 //numeric id, stochastic thresold, nV, n complex, alt form id
    0.0200000     -77.0000     -80.0000      1.50000        //gBase, erev, vMin, deltaV
 1 5 8 0 1 1 0 1 2 2 1 2 3 3 2 3 4 4 3                      //ninstances, nstate, ntransition, (from, to)*ntransition
      0.00000      0.00000      0.00000      0.00000      1.00000 //s0   s1   s2   s3   s4
     -80.0000    0.0894255      0.00000     0.150779      0.00000    0.0670691      0.00000     0.301558      0.00000    0.0447127      0.00000     0.452336      0.00000    0.0223564      0.00000     0.603115      0.00000 //v, (fwd, rev) for each transition
     -78.5000    0.0990979      0.00000     0.147978      0.00000    0.0743234      0.00000     0.295956      0.00000    0.0495489      0.00000     0.443934      0.00000    0.0247745      0.00000     0.591912      0.00000
 ...
 ...
      59.5000      4.58005      0.00000    0.0263657      0.00000      3.43504      0.00000    0.0527314      0.00000      2.29002      0.00000    0.0790971      0.00000      1.14501      0.00000     0.105463      0.00000
HH_K-mc
 1 0 94 1 0                                                 //numeric id, stochastic thresold, nV, n complex, alt form id
    0.0200000     -77.0000     -80.0000      1.50000        //gBase, erev, vMin, deltaV
 4 2 2 0 1 1 0                                              //ninstances, nstate, ntransition, (from, to)*ntransition
      0.00000      1.00000                                  //c   o
     -80.0000    0.0223564      0.00000     0.150779      0.00000
     -78.5000    0.0247745      0.00000     0.147978      0.00000
...
...
      58.0000      1.13001      0.00000    0.0268647      0.00000
      59.5000      1.14501      0.00000    0.0263657      0.00000
HH_Na
 2 0 94 1 3                                                 //numeric id, stochastic thresold, nV, n complex, alt form id
    0.0200000      50.0000     -80.0000      1.50000        //gBase, erev, vMin, deltaV
 1 8 20 0 1 1 0 1 2 2 1 2 3 3 2 0 4 4 0 1 5 5 1 2 6 6 2 3 7 7 3 4 5 5 4 5 6 6 5 6 7 7 6 //ninstances, nstate, ntransition, (from, to)*ntransition
      0.00000      0.00000      0.00000      0.00000      0.00000      0.00000      0.00000      1.00000 //s0   s1   s2   s3   s4   s5   s6   s7
     -80.0000     0.223888      0.00000      9.20390      0.00000     0.149259      0.00000      18.4078      0.00000    0.0746294      0.00000      27.6117      0.00000     0.148190      0.00000    0.0109869      0.00000     0.148190      0.00000    0.0109869      0.00000     0.148190      0.00000    0.0109869      0.00000     0.148190      0.00000    0.0109869      0.00000     0.223888      0.00000      9.20390      0.00000     0.149259      0.00000      18.4078      0.00000    0.0746294      0.00000      27.6117      0.00000
...
...
...
leak
 4 0 0 0 -1                                                 //numeric id, stochastic thresold, nV, n complex, alt form id
  1.00000e-05     -65.0000      0.00000      0.00000        //gBase, erev, vMin, deltaV
 1002                                                       //n compartments
 0 1 3 0 565 4 40 2 1883                                    //index, ncon, npop, (chantype, nchan)*npop
     1      786.184      0.00000    0.0156923     0.499500      0.00000      0.00000 //neighbours, conductances to nieghbours, v capacitance, x, y, z
 1 2 3 0 1130 4 78 2 3766
     0 2      786.184      786.184      0.00000    0.0313845     0.499500      0.00000      0.00000
 2 2 3 0 1129 4 78 2 3767
     1 3      786.184      786.184      0.00000    0.0313845      1.49850      0.00000      0.00000
 3 2 3 0 1130 4 79 2 3765
     2 4      786.184      786.184      0.00000    0.0313845      2.49750      0.00000      0.00000
 ...
 ...
 999 2 3 0 1130 4 78 2 3766
     998 1000      786.184      786.184      0.00000    0.0313845      997.502      0.00000      0.00000
 1000 2 3 0 1130 4 79 2 3766
     999 1001      786.184      786.184      0.00000    0.0313845      998.501      0.00000      0.00000
 1001 1 3 0 564 4 39 2 1883
     1000      786.184      0.00000    0.0156923      999.500      0.00000      0.00000
 1 1                                                        //n clamp, n recorder
null
 0 0 1                                                      //target, type, n profile      (current clamp)
      100.000 0 0 0                                         //start val, nnoise, seed, ntvt,
null
 1001 2                                                     //target, type     for voltage recorder at p1
END OF MODEL SPECIFICATION
                                                            //end marker - must start with 'END'

Output format

The beginning of the output text file for the Rallpack3 model is shown below. It simply contains rows with the time and then voltages at any current clamps or voltage recorders, and the currents at any voltage clamps. The units for all output files are milliseconds, milliVolts and picoAmps.

	#time  cc00000 cc00001
    0.000000       -65.00000       -65.00000
   0.1000000E-01   -62.84897       -65.61226
   0.2000000E-01   -63.45548       -66.18185
   0.3000000E-01   -62.75234       -66.71355
   0.4000000E-01   -63.29189       -67.21165
   0.5000000E-01   -62.92438       -67.67982
   0.6000000E-01   -63.40361       -68.12132
   0.7000000E-01   -63.18750       -68.53887
   0.8000000E-01   -63.61474       -68.93489
   0.9000000E-01   -63.48351       -69.31135
   0.9999999E-01   -63.86657       -69.67000
   0.1100000       -63.78883       -70.01225
   0.1200000       -64.13442       -70.33936

Where the specificaiton require multiple realizations of the model, as for example, in the mean-variance model, The successive results are lined up as new columns in a single table so that each row contains results for all the runs. The time only appears once at the start of each row.

The binary output file is written in the order that the results are collected. It contains a set of integers at the beginning: the number of runs, the number of step in each run, the number of clamps of each type (voltage, current, conductance) and the indexes of the points at which each clamp is applied. Thereafter it contains the same information as in a single-run text file. If there are multiple runs, these follow one after another. All the floating quantities are stored as floats (rather than doubles). The utilities section includes links to sample Yorick code for reading the binary format.

Parallel and multiprocessor environments

PSICS implements what is known as "embarassing" parallelism, which is generally the simplest form of parallelism, and, where it is applicable, the most reliable way to get almost linear improvements in performance with increasing numbers of processors. For many applications, a single realization of a stochastic calculation is not of much use. What is needed are multiple independent realizations so that the statistical properties of the system can be computed. The embarassingly parallel approach is just to run the different instances on different processors. The great benefit of this approach is that no change to the internal code is required. All the parallelism can be implemented in external tools that allocate tasks to processors and gather the results together.

This form of parallelism contrasts with more low-level parallelism where a single calculation is split across multiple processing units. For such a system to work efficiently careful atention must be paid to the way the units communicate so as not to hold one part of the calculation up while it is waiting for the others. On the plus side, if all the programming issues can be resolved, such systems may show supra-linear speedup where a calculation runs more than n times as fast on a n-processor machine than it does with a single processor. This occurs where the program is too big to fit in the cache of a single processor, so much of the time on a single-processor machine is spent waiting for data from main memory, not actually doing the calculation. By spreading the task over multiple processors, there is more cache available and more of of the data gets in the cache, thereby cutting down cache misses and using more of the avaialble calculation cycles. Under good conditions the benefits from fewer cache misses actually outweigh the parallel communication costs and a supralinear sppedup is seen. However, since a typical PSICS model already fits entierly in the cache of a single processor (see the memoroy footprint section under System architecture) there is no possibility of supralinear speed-up for PSICS, and the embarassingly parallel solution probides the most efficient use of resources in the majority of cases.

Principles of operation

The parallel tools operate on the text-based model specification that is the input of the core calculation and are entirely independent of the original XML model specification. They read the specification file and generate a set of new specification files according to the number of processors avaialble. Each of the new speicfication files comprises part of the total task (usually a certain number of the total number of runs required). These new files are distributed to the separate nodes which are then instructed to run the PSICS core on them as normal. When the calculations are finished, the resulting binary files are gathered together and merged into single files in the formats described above, just as if all the calculations had been performed on just one node.

Ptacticalities

TBD To come when its written...