=================================== Basic Subtomogram Averaging with I3 =================================== :Author: Dustin Reed Morado Setting up the I3 Project Directory =================================== The first thing to do is the set your current working directory to the folder containing all of your extracted particles and old I3 position data files () or new I3 transform data files (). Then we will create our I3 project directory, and inside of that directory we will create a folder for our maps, project definitions, tilt angles describing our particle’s missing wedges, and transforms: #. ``mkdir i3 # Make our project directory`` #. ``mkdir i3/defs # Make our project definitions directory`` #. ``mkdir i3/maps # Make our project maps directory`` #. ``mkdir i3/tlt # Make our project missing wedge directory`` #. ``mkdir i3/trf # Make our project initial transforms directory`` #. …Or simply: ``mkdir -p i3/{defs,maps,tlt,trf}`` Now we change into our newly created project directory with ``cd i3``. Now we need to copy a template parameter file (``mraparam.sh``) and a template protomo tilt angle file (``template.tlt``) that describes our data’s missing wedge into our project directory. These are included in the example folder of this reference package: 7. ``cp ~/Downloads/i3_guides/examples/mraparam.sh . # Parameter file`` 8. ``cp ~/Downloads/i3_guides/examples/template.tlt . # Missing Wedge`` Finally to finish setting the basics of our I3 project directory; edit the parameter file and the tilt angle file to suit the needs of your current project. The parameter file is well documented in explaining what each of the parameters does and while most of the given values must be changed, they provide a meaningfull starting points of the values that you probably want to use for your project. Filling the project directories ------------------------------- The next step before we start running the program is to fill the maps, definitions, tilt, and transforms directories we created above. You will find it easiest to start with the maps directory and from there we can use loops in the Bash shell to quickly populate the other directories. Maps directory ~~~~~~~~~~~~~~ I3 is very selective when it comes to the names of the maps. Shorter names seem to give the least amount of trouble. Therefore it is useful to create symbolic links to the extracted particles in the directory above our I3 project directory with new names to keep the program happy. First, obviously change into the maps directory and the following Bash shell loop does exactly that: .. code:: bash jliu@keemun i3/maps $ i=1; for j in ../../*.mrc do ln -sv ${i} p$(printf "%05d" ${i}).mrc i=$((i+1)) done This creates symbolic links from whatever your subtomograms are named to “``p00001.mrc, p00002.mrc,`` …” Tilt angles directory ~~~~~~~~~~~~~~~~~~~~~ Next, the missing wedge for each map is described using our tilt angle files. The most simple and straightforward way to do this is to using the template we copied to our project directory to describe the missing wedge for each map and particle. To do this we again create symbolic links to our template file for each map that we just created in our maps directory. Again, change into the tlt directory and the following loop will accomplish that: .. code:: bash jliu@keemun i3/tlt $ for i in ../maps/*.mrc do ln -sv ../template.tlt $(basename ${i} .mrc).tlt done This creates symbolic links “``p00001.tlt, p00002.tlt,`` …” to ``template.tlt``. Transforms directory ~~~~~~~~~~~~~~~~~~~~ The transforms directory can be the most challenging to fill, there are many possible situations based on your particular project: #. Particle coordinates as a single point per subtomogram center; no orientation #. Particle coordinates as two points per subtomogram; describes orientation #. Old I3 transform as a pos file; describes inverse orientation #. New I3 transform as a trf file from a previous run; describes orientation In the first case we can create the most basic transform file for each map. Refer to the I3 tutorial PDF to understand what each field of the transform file describes. After changing into the transforms directory the following Bash shell can be used to create these files: .. code:: bash jliu@keemun i3/trf $ i=1; for j in ../../*.mrc do eval $(i3stat -sh -o ${j}) tx=$(((ox + nx) / 2)) ty=$(((oy + ny) / 2)) tz=$(((oz + nz) / 2)) fmt_i=$(printf "%05d" ${i}) echo -n "p${fmt_i} ${tx} ${ty} ${tz} " > p${fmt_i}.trf echo "0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0" >> p${fmt_i}.trf i=$((i+1)) done In the second case it is easiest to use the programs that take these positions and convert them into Old I3 transform pos files and go forward from the third case. From here we can convert these to New I3 transforms using the i3assist library. To do this we go to the directory with the original maps and position files and start the IPython console using the command ``ipython`` from the shell and use the following commands: .. code:: python import i3assist import glob import os.path for posfile in sorted(glob.glob("*.pos")): trffile = os.path.splitext(posfile)[0] + '.trf' pl = i3assist.PositionList() tl = i3assist.TransformList() pl.from_file(posfile) pos = pl[0] pos_trf = pos.to_trf() tl.transforms = [pos_trf] tl.to_file(trffile) Then we need to change directories back to the I3 project transforms directory and run the following Bash script to correctly insert the proper first four fields into the convert transform files and give these files the correct names: .. code:: bash jliu@keemun i3/trf $ i=i; for j in ../../*.mrc do trffile=${j/%mrc/trf} fmt_i=$(printf "%05d" ${i}) eval $(i3stat -sh -o ${j}) tx=$(((ox + nx) / 2)) ty=$(((oy + ny) / 2)) tz=$(((oz + nz) / 2)) awk -v s="p${fmt_i}" -v tx=${tx} -v ty=${ty} -v tz=${tz} ' { $1 = s; $2 = tx; $3 = ty; $4 = tz; print }' ${trffile} > p${fmt_i}.trf i=$((i+1)) done We can then safely delete the temporary transform files we created with i3assist. .. code:: bash jliu@keemun i3/trf $ rm ../../*.trf For the last case we just need to rename the transform files to match the same naming convention as our maps. The loop to do this is simply the one we used for filling the maps directory: .. code:: bash jliu@keemun i3/trf $ i=1; for j in ../../*.trf do ln -sv ${i} p$(printf "%05d" ${i}).trf i=$((i+1)) done Definitions directory ~~~~~~~~~~~~~~~~~~~~~ The last step in filling in our project directories is the definitions directory, which just contains two files ``maps`` and ``sets``. Refer to the I3 tutorial PDF to see the format of these files, but with all of the other directories already setup generating these two files is simple using the following loop: .. code:: bash jliu@keemun i3/defs $ touch maps sets; for i in ../maps/*.mrc do echo "../maps $(basename ${i}) ../tlt/$(basename ${i} .mrc).tlt" >> maps echo "$(basename ${i}) $(basename ${i} .mrc)" >> sets done And we are done with setup, and can continue to actually processing our project, which is extremely simple. Running the First Cycle ======================= With everything in place; the first cycle of processing utilizes four shell scripts that combine and abstract the smaller building block programs of I3 into sensible processing units based on the road map of basic subtomogram averaging and classification. #. ``i3mrainitial.sh # Produces the initial global average, reference and masks`` #. ``i3mramsacls.sh 0 # Runs the actual alignment and classification.`` #. ``i3cp.sh 0 # Copies selected class averages to select folder for alignemnt`` #. ``i3mraselect.sh 0 # Aligns selected class averages to make the final alignment`` i3mrainitial ------------ After running i3mrainitial you will now have a directory ``cycle-000`` in your project folder. In this folder you will have the initial I3 database, the global average of the subtomograms based on the transforms given by the transform files in the transform directory, the masked and filtered reference that will be used in the subsequent alignment step along with the Fourier transform of this file, and finally the binary mask that will be used in the subsequent classification step. There may also be montages of the reference and versions of all maps that have been rotated about the X-axis to visualize the maps perpendicular to the Z-axis, which while useful in some cases, can also be visualized using IMOD’s slicer window. Whether or not these files are created are based on the parameters you set in your parameter file. Troubleshooting ~~~~~~~~~~~~~~~ This is the most likely command to fail in running I3, due to the fact that this is when the I3 database is first created. The error messages are also not particularly helpful but the following suggestions may help. When restarting a run that failed delete the ``cycle-000`` directory make whatever corrections necessary and then rerun the command: ``rm -rf cycle-000 && i3mrainitial.sh``. - ``i3external`` errors are often caused by your maps having too long of a filename or if you have followed this guide that should not be the case, and therefore means you have more maps in project than I3 can handle which can be anywhere from 1,000 to 10,000. Try splitting your data into multiple I3 projects, or refer to the intermediate or expert guides for how to manually add maps to the database. - ``i3boximport`` errors are due to the second to fourth fields of of your transform file being incorrect. Again if you have followed this guide your particle center coordinates should be correct. However, if you created the transform files yourself, double check that your volumes coordinates and origin correspond correctly with the centers in your transforms. You can do this using the I3 command: ``i3stat -o ``. - ``i3dataset`` errors are the most difficult to debug. They signal that the shifts and rotations describing a subtomograms orientation and position are invalid, duplicated elsewhere in the transform file, or incorrectly formated. Again following this guide should prevent these problems, but if you created your own transform files, make sure that each line in the transform file has 16 fields; that lines have the same transformed coordinates, and that the last nine fields are all values between 0 and 1. i3mramsacls ----------- After running i3mramsacls you will have many new files in the ``cycle-000`` directory. However, there are just a few that as a beginner you should look at before starting the next program. The first is a montage of the calculated factors in the SVD (Singular Value Decomposition) processing of the dataset. These factors reduce the dimension of the dataset to the most variable regions of interest and this variance is used to cluster the data into classes using HAC (Hierarchical Ascendant Clustering). The second of these are the class averages produced after the clustering. Here you are looking to make sure that the classes correspond to true variation and heterogeneity in the data, and not artifacts such as the missing wedge, simple variations in noise, and junk such as gold and debris nearby particles. You will want to especially focus on the class averages that have been selected for aligning class averages in the last step of the processing of the first cycle. Troubleshooting ~~~~~~~~~~~~~~~ Errors in this stage of processing are uncommon. However if you have any errors, they will almost certainly come from a mistake in the parameter file. Make sure that alignment parameters are sane, and most frequently make sure that the class averages requested were actually calculated. Rerunning this step to fix specific errors is beyond the scope of a beginner tutorial, and for more information on how to handle these situations efficiently, please refer to the intermediate and expert guides. i3cp ---- After running this command the only thing done is copying the selected class averages to a new select folder to be aligned in the next step. There’s nothing to check at this step, just move quickly on to the last script. Troubleshooting ~~~~~~~~~~~~~~~ The only error in this stage is if you selected a class for which class averages were not generated. Edit your parameter file making sure that the class selected does exist. To rerun the command, find the most recent generated directory in the ``cycle-000`` directory and delete it (it should have the name <…>-000-sel): .. code:: bash jliu@keemun i3 $ ls -ltr cycle-000 # Find the most recently created directory jliu@keemun i3 $ rm -rf cycle-000/<...>-000-sel #replace <...> as appropriate jliu@keemun i3 $ i3cp.sh 0 # rerun command i3mraselect ----------- After running this last stage of the first cycle, you will finally have an aligned average to inspect. Optionally if you specified FSC (Fourier Shell Correlation) masks in your parameter file, you will also have even and odd half averages and the corresponding FSC data and graph in postscript format. Note that this resolution reported is not gold-standard and can easily overestimate the true resolution of your data. You have now finished your first cycle and the next step is to create and run another cycle and we will repeat this until our structure converges by visual inspection or in terms of resolution. Troubleshooting ~~~~~~~~~~~~~~~ Errors in this stage are also uncommon similar to i3mramsacls and if you encounter trouble here refer to the suggestions for that section. Running the Second and Subsequent Cycles ======================================== With our first cycle complete, basically all of the steps are all the same. The only difference is the first step which originally was ``i3mrainitial.sh`` is now replaced with ``i3mranext.sh 0 1`` where the 0 represents our old cycle number and 1 represents the new cycle we are now calculating. Here we will not create the global average (since it would be the same as the previous cycle’s aligned average), and we will not create the new cycle’s database from scratch, but instead copy it from the previous cycle and appended to with new information on the current cycle. Again, we will be left with the reference for this cycle and the classification mask that will be used in the next stage. i3mranext --------- Before running this command be sure to edit and update your parameter file to take into account the refinement achieved in the first cycle. Do not worry about losing the parameters used in the first cycle as a copy of the parameter file used for that cycle exists in the ``cycle-000`` directory. Troubleshooting ~~~~~~~~~~~~~~~ You should also not experience any common errors at this stage. Any errors you do experience should point to errors in your parameter files, specifically the filtering, masking, and location of the reference, as well as the masks created for classification. The rest of the cycle --------------------- Now that you have a directory ``cycle-001`` you can repeat the last three stages of the first cycle. Namely: #. ``jliu@keemun i3 $ i3mramsacls.sh 1`` #. ``jliu@keemun i3 $ i3cp.sh 1`` #. ``jliu@keemun i3 $ i3mraselect.sh 1`` Conclusion ========== Again, we repeat the above steps for as many cycles as desired. If we know beforehand that we want to run multiple cycles in succession, I3 supports an Bash shell environment variable ``I3PARAM`` that defaults to ``mraparam.sh``, but we can set to another value to support using multiple parameter files written at once. For example if we want to run 10 cycles without manually checking each stage and each cycle we can write a parameter file for each cycle, say .. code:: bash mraparam_00.sh, mraparam_01.sh, ... mraparam_09.sh and then we run the following loop: .. code:: bash for i in {0..9} do fmt_i=$(printf "%02d" ${i}) export I3PARAM="mraparam_${fmt_i}.sh" if [[ ${i} -eq 0 ]] then i3mrainitial.sh else i3mranext.sh $((i-1)) ${i} fi i3mramsacls.sh ${i} i3cp.sh ${i} i3mraselect.sh ${i} done