Basic Subtomogram Averaging with I3¶
Author: | Dustin Reed Morado |
---|
Setting up the I3 Project Directory¶
The first thing to do is the set your current working directory to the folder containing all of your extracted particles and old I3 position data files () or new I3 transform data files (). Then we will create our I3 project directory, and inside of that directory we will create a folder for our maps, project definitions, tilt angles describing our particle’s missing wedges, and transforms:
mkdir i3 # Make our project directory
mkdir i3/defs # Make our project definitions directory
mkdir i3/maps # Make our project maps directory
mkdir i3/tlt # Make our project missing wedge directory
mkdir i3/trf # Make our project initial transforms directory
- …Or simply:
mkdir -p i3/{defs,maps,tlt,trf}
Now we change into our newly created project directory with cd i3
. Now we
need to copy a template parameter file (mraparam.sh
) and a template protomo
tilt angle file (template.tlt
) that describes our data’s missing wedge into
our project directory. These are included in the example folder of this
reference package:
cp ~/Downloads/i3_guides/examples/mraparam.sh . # Parameter file
cp ~/Downloads/i3_guides/examples/template.tlt . # Missing Wedge
Finally to finish setting the basics of our I3 project directory; edit the parameter file and the tilt angle file to suit the needs of your current project. The parameter file is well documented in explaining what each of the parameters does and while most of the given values must be changed, they provide a meaningfull starting points of the values that you probably want to use for your project.
Filling the project directories¶
The next step before we start running the program is to fill the maps, definitions, tilt, and transforms directories we created above. You will find it easiest to start with the maps directory and from there we can use loops in the Bash shell to quickly populate the other directories.
Maps directory¶
I3 is very selective when it comes to the names of the maps. Shorter names seem to give the least amount of trouble. Therefore it is useful to create symbolic links to the extracted particles in the directory above our I3 project directory with new names to keep the program happy. First, obviously change into the maps directory and the following Bash shell loop does exactly that:
jliu@keemun i3/maps $ i=1; for j in ../../*.mrc
do
ln -sv ${i} p$(printf "%05d" ${i}).mrc
i=$((i+1))
done
This creates symbolic links from whatever your subtomograms are named to
“p00001.mrc, p00002.mrc,
…”
Tilt angles directory¶
Next, the missing wedge for each map is described using our tilt angle files. The most simple and straightforward way to do this is to using the template we copied to our project directory to describe the missing wedge for each map and particle. To do this we again create symbolic links to our template file for each map that we just created in our maps directory. Again, change into the tlt directory and the following loop will accomplish that:
jliu@keemun i3/tlt $ for i in ../maps/*.mrc
do
ln -sv ../template.tlt $(basename ${i} .mrc).tlt
done
This creates symbolic links “p00001.tlt, p00002.tlt,
…” to template.tlt
.
Transforms directory¶
The transforms directory can be the most challenging to fill, there are many possible situations based on your particular project:
- Particle coordinates as a single point per subtomogram center; no orientation
- Particle coordinates as two points per subtomogram; describes orientation
- Old I3 transform as a pos file; describes inverse orientation
- New I3 transform as a trf file from a previous run; describes orientation
In the first case we can create the most basic transform file for each map. Refer to the I3 tutorial PDF to understand what each field of the transform file describes. After changing into the transforms directory the following Bash shell can be used to create these files:
jliu@keemun i3/trf $ i=1; for j in ../../*.mrc
do
eval $(i3stat -sh -o ${j})
tx=$(((ox + nx) / 2))
ty=$(((oy + ny) / 2))
tz=$(((oz + nz) / 2))
fmt_i=$(printf "%05d" ${i})
echo -n "p${fmt_i} ${tx} ${ty} ${tz} " > p${fmt_i}.trf
echo "0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0" >> p${fmt_i}.trf
i=$((i+1))
done
In the second case it is easiest to use the programs that take these
positions and convert them into Old I3 transform pos files and go
forward from the third case. From here we can convert these to New I3
transforms using the i3assist library. To do this we go to the directory
with the original maps and position files and start the IPython console
using the command ipython
from the shell and use the following commands:
import i3assist
import glob
import os.path
for posfile in sorted(glob.glob("*.pos")):
trffile = os.path.splitext(posfile)[0] + '.trf'
pl = i3assist.PositionList()
tl = i3assist.TransformList()
pl.from_file(posfile)
pos = pl[0]
pos_trf = pos.to_trf()
tl.transforms = [pos_trf]
tl.to_file(trffile)
Then we need to change directories back to the I3 project transforms directory and run the following Bash script to correctly insert the proper first four fields into the convert transform files and give these files the correct names:
jliu@keemun i3/trf $ i=i; for j in ../../*.mrc
do
trffile=${j/%mrc/trf}
fmt_i=$(printf "%05d" ${i})
eval $(i3stat -sh -o ${j})
tx=$(((ox + nx) / 2))
ty=$(((oy + ny) / 2))
tz=$(((oz + nz) / 2))
awk -v s="p${fmt_i}" -v tx=${tx} -v ty=${ty} -v tz=${tz} '
{ $1 = s; $2 = tx; $3 = ty; $4 = tz; print }' ${trffile} > p${fmt_i}.trf
i=$((i+1))
done
We can then safely delete the temporary transform files we created with i3assist.
jliu@keemun i3/trf $ rm ../../*.trf
For the last case we just need to rename the transform files to match the same naming convention as our maps. The loop to do this is simply the one we used for filling the maps directory:
jliu@keemun i3/trf $ i=1; for j in ../../*.trf
do
ln -sv ${i} p$(printf "%05d" ${i}).trf
i=$((i+1))
done
Definitions directory¶
The last step in filling in our project directories is the definitions
directory, which just contains two files maps
and sets
. Refer to the I3
tutorial PDF to see the format of these files, but with all of the other
directories already setup generating these two files is simple using the
following loop:
jliu@keemun i3/defs $ touch maps sets; for i in ../maps/*.mrc
do
echo "../maps $(basename ${i}) ../tlt/$(basename ${i} .mrc).tlt" >> maps
echo "$(basename ${i}) $(basename ${i} .mrc)" >> sets
done
And we are done with setup, and can continue to actually processing our project, which is extremely simple.
Running the First Cycle¶
With everything in place; the first cycle of processing utilizes four shell scripts that combine and abstract the smaller building block programs of I3 into sensible processing units based on the road map of basic subtomogram averaging and classification.
i3mrainitial.sh # Produces the initial global average, reference and masks
i3mramsacls.sh 0 # Runs the actual alignment and classification.
i3cp.sh 0 # Copies selected class averages to select folder for alignemnt
i3mraselect.sh 0 # Aligns selected class averages to make the final alignment
i3mrainitial¶
After running i3mrainitial you will now have a directory cycle-000
in your
project folder. In this folder you will have the initial I3 database, the global
average of the subtomograms based on the transforms given by the
transform files in the transform directory, the masked and filtered
reference that will be used in the subsequent alignment step along with
the Fourier transform of this file, and finally the binary mask that
will be used in the subsequent classification step. There may also be
montages of the reference and versions of all maps that have been
rotated about the X-axis to visualize the maps perpendicular to the
Z-axis, which while useful in some cases, can also be visualized using
IMOD’s slicer window. Whether or not these files are created are based
on the parameters you set in your parameter file.
Troubleshooting¶
This is the most likely command to fail in running I3, due to the fact that this is when the I3 database is first created. The error messages are also not particularly helpful but the following suggestions may help.
When restarting a run that failed delete the cycle-000
directory make
whatever corrections necessary and then rerun the command: rm -rf cycle-000 &&
i3mrainitial.sh
.
i3external
errors are often caused by your maps having too long of a
filename or if you have followed this guide that should not be the case, and therefore means you have more maps in project than I3 can handle which can be anywhere from 1,000 to 10,000. Try splitting your data into multiple I3 projects, or refer to the intermediate or expert guides for how to manually add maps to the database.
i3boximport
errors are due to the second to fourth fields of of your
transform file being incorrect. Again if you have followed this guide your particle center coordinates should be correct. However, if you created the transform files yourself, double check that your volumes coordinates and origin correspond correctly with the centers in your transforms. You can do this using the I3 command:i3stat -o <Your subtomogram filename>
.
i3dataset
errors are the most difficult to debug. They signal that the
shifts and rotations describing a subtomograms orientation and position are invalid, duplicated elsewhere in the transform file, or incorrectly formated. Again following this guide should prevent these problems, but if you created your own transform files, make sure that each line in the transform file has 16 fields; that lines have the same transformed coordinates, and that the last nine fields are all values between 0 and 1.
i3mramsacls¶
After running i3mramsacls you will have many new files in the cycle-000
directory. However, there are just a few that as a beginner you should look at
before starting the next program.
The first is a montage of the calculated factors in the SVD (Singular Value Decomposition) processing of the dataset. These factors reduce the dimension of the dataset to the most variable regions of interest and this variance is used to cluster the data into classes using HAC (Hierarchical Ascendant Clustering).
The second of these are the class averages produced after the clustering. Here you are looking to make sure that the classes correspond to true variation and heterogeneity in the data, and not artifacts such as the missing wedge, simple variations in noise, and junk such as gold and debris nearby particles. You will want to especially focus on the class averages that have been selected for aligning class averages in the last step of the processing of the first cycle.
Troubleshooting¶
Errors in this stage of processing are uncommon. However if you have any errors, they will almost certainly come from a mistake in the parameter file. Make sure that alignment parameters are sane, and most frequently make sure that the class averages requested were actually calculated.
Rerunning this step to fix specific errors is beyond the scope of a beginner tutorial, and for more information on how to handle these situations efficiently, please refer to the intermediate and expert guides.
i3cp¶
After running this command the only thing done is copying the selected class averages to a new select folder to be aligned in the next step. There’s nothing to check at this step, just move quickly on to the last script.
Troubleshooting¶
The only error in this stage is if you selected a class for which class averages were not generated. Edit your parameter file making sure that the class selected does exist.
To rerun the command, find the most recent generated directory in the
cycle-000
directory and delete it (it should have the name <…>-000-sel):
jliu@keemun i3 $ ls -ltr cycle-000 # Find the most recently created directory
jliu@keemun i3 $ rm -rf cycle-000/<...>-000-sel #replace <...> as appropriate
jliu@keemun i3 $ i3cp.sh 0 # rerun command
i3mraselect¶
After running this last stage of the first cycle, you will finally have an aligned average to inspect. Optionally if you specified FSC (Fourier Shell Correlation) masks in your parameter file, you will also have even and odd half averages and the corresponding FSC data and graph in postscript format. Note that this resolution reported is not gold-standard and can easily overestimate the true resolution of your data.
You have now finished your first cycle and the next step is to create and run another cycle and we will repeat this until our structure converges by visual inspection or in terms of resolution.
Troubleshooting¶
Errors in this stage are also uncommon similar to i3mramsacls and if you encounter trouble here refer to the suggestions for that section.
Running the Second and Subsequent Cycles¶
With our first cycle complete, basically all of the steps are all the same. The
only difference is the first step which originally was i3mrainitial.sh
is
now replaced with i3mranext.sh 0 1
where the 0 represents our old cycle
number and 1 represents the new cycle we are now calculating. Here we will not
create the global average (since it would be the same as the previous cycle’s
aligned average), and we will not create the new cycle’s database from scratch,
but instead copy it from the previous cycle and appended to with new information
on the current cycle. Again, we will be left with the reference for this cycle
and the classification mask that will be used in the next stage.
i3mranext¶
Before running this command be sure to edit and update your parameter
file to take into account the refinement achieved in the first cycle. Do
not worry about losing the parameters used in the first cycle as a copy
of the parameter file used for that cycle exists in the cycle-000
directory.
Troubleshooting¶
You should also not experience any common errors at this stage. Any errors you do experience should point to errors in your parameter files, specifically the filtering, masking, and location of the reference, as well as the masks created for classification.
The rest of the cycle¶
Now that you have a directory cycle-001
you can repeat the last three stages
of the first cycle. Namely:
jliu@keemun i3 $ i3mramsacls.sh 1
jliu@keemun i3 $ i3cp.sh 1
jliu@keemun i3 $ i3mraselect.sh 1
Conclusion¶
Again, we repeat the above steps for as many cycles as desired. If we know
beforehand that we want to run multiple cycles in succession, I3 supports an
Bash shell environment variable I3PARAM
that defaults to mraparam.sh
,
but we can set to another value to support using multiple parameter files
written at once.
For example if we want to run 10 cycles without manually checking each stage and each cycle we can write a parameter file for each cycle, say
mraparam_00.sh, mraparam_01.sh, ... mraparam_09.sh
and then we run the following loop:
for i in {0..9}
do
fmt_i=$(printf "%02d" ${i})
export I3PARAM="mraparam_${fmt_i}.sh"
if [[ ${i} -eq 0 ]]
then
i3mrainitial.sh
else
i3mranext.sh $((i-1)) ${i}
fi
i3mramsacls.sh ${i}
i3cp.sh ${i}
i3mraselect.sh ${i}
done