PyTom: Align subtomograms

## Overview

Once you computed and stored the subtomograms, you can average them in order to get a rough estimate of the macromolecule. To obtain a higher resolution average you have to refine the orientations and translations of all subtomograms ('alignment'). The alignment process can become time-consuming, depending on the number of subtomograms, their size, and the power of your computer setup (number of CPUs). Hence, subtomogram alignment is parallelized and typically run on a computer cluster.

## Subtomogram Alignment using align.py

The script bin/align.py performs iterative quasi-expectation maximization alignment using constrained correlation. The rotational sampling is performed in real space. A function call looks like this:

 mpirun --hostfile "pathToYourHostfile" -c "numberOfCPUs" pytom PathToPytom/bin/align.py -j alignment.xml 

The XML file alignment.xml contains all information required for subtomogram alignment:

• Particle List. A particle list in XML format that points to all subtomograms. Alternatively, you can also start with directory of EM, MRC or CCP4 files that contains all subtomograms. The advantage of a particle list over plain files is that alignment can re-use previously determined alignment parameters for refinement. Starting from plain files does not allow that and will generate an XML with default orienations and translation (all zero).
• An initial reference. The reference may be an EM, MRC or CCP4 file. The reference can be the preleminary average from the roughly-aligned particles, an existing density or a density generated from a a PDB file. (For PDB files, you can use the pytom.basic.files.pdb2em function to obtain densities).
• Mask. The mask may be spherical or non-symmetrical. Default masks are spherical in PyTom. However, to focus alignment on a particular region, you may use a mask of aribtrary shape that will be rotated during alignment. Can be generated using PyTom following these instructions.
• Sampling Angles. There are two different modes for sampling different orientations: global and local. In Global Sampling and exhaustive orientation sampling is performed. Lists of angles are pre-computed in PyTom and can be selected. Global sampling is typically only needed at early stages of the alignment and fine sampling would cause enormous computational resources. Therefore, the Local Search is the preferred option when the orientations of the subtomograms have been determined to some degree of accuracy. For explanation of the sampling we refer to Fig. 1. Note that the specified Angular increment will be overruled by the increment determined by the adaptive sampling if this option is chosen. If you are not sure what increment to use for the alignment, you can estimate the increment with the angleFromResolution function. Type:
 ipytom from pytom.basic.resolution import angleFromResolution print angleFromResolution(30,250)  Here, 30 is a hypothetical resolution in Angstrom and 250 is a particle diameter in Angstrom. • Sample Information. Information on your macromolecule of interest. The missing wedge is specified by two positive parameters: angle to the left and to the right of 90 degrees. • Score. Different scoring functions can be used for alignment: (Non-Normalized) Cross-Correlation function (XCF), Normalized Cross-Correlation function (NXCF), Fast Local Correlation Function (FLCF), Mutual Correlation Function (MCF). The FLCF is most extensively tested. • Peak Prior. In some cases impose a prior for the translations: The correlation function is multiplied with a Gaussian mask with the chosen parameters. • Adaptive Parameters. Angular sampling and resolution of the reference can be adjusted to the resolution of the reference as described in (Hrabe et al, 2012): \Delta \theta = \frac{f}{  Diameter_p \times Resolution(FSC_{Cutoff})}and Lowpass = Resolution(FSC_{Cutoff}) \times (1 + Offset). • Bandpass. A bandpass filter can be specified for filtering the reference. Note that the lowpass will be overruled by the value determined by the adaptive sampling if this option is chosen. • Iterations. Number of iterations is specified here as well as the destination directory for the output files. Moreover, it can be chosen to bin (downscale) the data to make computations faster (on the expense of obtainable resolution). Fig. 1. Angular scanning. The angles Psi (Z1) and Theta (X1) scanned around the old \Psi and \Theta, whereby the old values act as a ‘pole’. Here the Angular Increment \Delta\Theta is 3 degrees and the Number of Shells N_{shell} is 2. That means the vector ( \Psi, \Theta) is rotated by \Delta\Theta=3 two times (the visible rings). The increment along \Psi is chosen as \Delta\Psi=\frac{N_{shell}}{sin(\Delta\theta*i_{shell})}, which accounts for the higher density of angles near the pole. The third Euler angle \Phi (Z2) is scanned with a constant increment \Delta \Phi=\Delta \Theta from \Phi_{min}=\Phi_{old}-N_{shell}*\Delta\Theta to \Phi_{max}=\Phi_{old}+N_{shell}*\Delta\Theta. This is a sample XML file specifying the alignment job:  <ExpectationMaximisationJob> <Description Destination=./results/" NumberIterations="10" ResultClassification="0.1" Binning="1" FSCCriterion="0.5" AdaptiveResolution="True" AdaptiveOffset="0.1" AngleFactor="0.5"> <!--Destination - result folder ; NumberIterations - number of alignment rounds; ResultClassification - Classification of correlation peak; Binning - Binning/Downscale used in alignment ; FSCCriterion - Criterion used for resolution adaptive alignment;AdaptiveOffset - resolution offset used for adaptive alignment; AngleFactor - factor used for resolution adaptive sampling--> <ParticleList Path=""> <Particle Filename="particle_0.em"><!-- A subtomogram--> <Rotation X="77.3204393293" Paradigm="ZXZ" Z1="-59.9796818569" Z2="79.3052103398"/><!-- current alignment rotation--> <Shift Y="0.0" X="0.0" Z="0.0"/><!-- current alignment shift--> <PickPosition Origin="./tomogram.em" Y="263.0" Z="67.0" X="191.0"/><!-- position in origin volume--> <Score Type="FLCFScore" Value="0.546716570854"><!-- last correlation value used--> <PeakPrior Smooth="-1.0" Radius="0.0" Filename=""/><!--peak prior used in last alignment process--> </Score> <WedgeInfo Smooth="0.0" Angle1="0.0" CutoffRadius="0.0" TiltAxis="custom" Angle2="0.0"><!--Information about missing wedge--> <Rotation Z1="0.0" Z2="0.0" X="0.0"/> </WedgeInfo> <Class Name="0"/><!--Class this subtomogram belongs to--> </Particle> <Particle Filename="particle_1.em"> <Rotation X="77.4490791121" Paradigm="ZXZ" Z1="-53.600247606" Z2="-135.235600887"/> <Shift Y="0.0" X="0.0" Z="0.0"/> <PickPosition Origin="./tomogram.em" Y="92.0" Z="36.0" X="436.0"/> <Score Type="FLCFScore" Value="0.546538472176"> <PeakPrior Smooth="-1.0" Radius="0.0" Filename=""/> </Score> <WedgeInfo Smooth="0.0" Angle1="0.0" CutoffRadius="0.0" TiltAxis="custom" Angle2="0.0"> <Rotation Z1="0.0" Z2="0.0" X="0.0"/> </WedgeInfo> <Class Name="0"/> </Particle> ... </ParticleList> <Reference PreWedge=" File=reference.em" Weighting=""><!-- Initial reference used --> <ParticleList Path="/"/> </Reference> <Mask Filename="mask.em" Binning="1" isSphere="True"/><!-- Mask used during alignment --> <Score Type="FLCFScore" Value="-10000000000.0"> <PeakPrior Smooth="-1.0" Radius="-1.0" Filename=""/> </Score> <Angles Type="FromEMFile" File="angles_19.95_1944.em"/><!--Angular sampling strategy --> <Preprocessing> <Bandpass LowestFrequency="0.0" Smooth="0.0" HighestFrequency="10.0"/> <!-- Bandpassfilter used, in pixels--> </Preprocessing> <Symmetry X="0.0" Z2="0.0" NFold="1"/><!-- Symmetry information (C-X) supported currently) <SampleInformation PixelSize="4.7" ParticleDiameter="250.0> </Description> </ExpectationMaximisationJob>  Please note that this XML file will not work due to the ... in the particle list. ## Create an Alignment job in the terminal with alignJob.py The alignJob.py script allows you to set up alignment through the terminal instead using the web-browser.  NAME alignJob.py DESCRIPTION Create an EXMX alignment job. OPTIONS -p, --particleList Particle list : xml file storing information to all subvolumes (Is optional: No; Requires arguments: Yes) -r, --reference Reference : the alignment reference (Is optional: No; Requires arguments: Yes) -m, --mask Mask : a mask (Is optional: No; Requires arguments: Yes) --wedge1 Wedge : first tilt angle. Must be 90-tilt! (Is optional: No; Requires arguments: Yes) --wedge2 Wedge : second tilt angle. Must be 90-tilt! (Is optional: No; Requires arguments: Yes) --angleShells # angle shells used for angular refinement. (Is optional: No; Requires arguments: Yes) --angleIncrement Angular increment for refinement. (Is optional: No; Requires arguments: Yes) --symmetry Symmetry : specify n-fold symmetry (n) (Is optional: Yes; Requires arguments: Yes) --symmetryAngleZ Symmetry axis tilt around Z axis (Is optional: Yes; Requires arguments: Yes) --symmetryAngleX Symmetry axis tilt around X axis (Is optional: Yes; Requires arguments: Yes) -l, --lowestFrequency Highest frequency band used : in pixels (Is optional: No; Requires arguments: Yes) -h, --highestFrequency Lowest frequency band used : in pixels (Is optional: No; Requires arguments: Yes) -d, --destination Destination : destination directory (Is optional: No; Requires arguments: Yes) -n, --numberIterations Number of iterations (Is optional: No; Requires arguments: Yes) -b, --binning Perform binning (downscale) of subvolumes: 1: no binning, 2: 2 pixels = 1, 3: 3 = 1 ... (Is optional: No; Requires arguments: Yes) --pixelSize Pixelsize in Angstrom (Is optional: No; Requires arguments: Yes) --particleDiameter Particle diameter in Angstrom (Is optional: No; Requires arguments: Yes) -j, --jobName Specify job.xml filename (Is optional: No; Requires arguments: Yes) -h, --help Help. (Is optional: Yes; Requires arguments: No)  Please note that not all options available through the web interface are available in the terminal. However, the resulting job.xml files can be manipulated to modify specific properties. ### Results of alignment • The aligned average named 1,2,3, .... (iteration number) • The aligned average named 1,2,3, .... (iteration number) filtered to the current resolution in Angstrom (1-ANGXYZ.em) • The average with inverted gray values (1-INV.em) • The sum of the missing wedges used for weighting the sum of all particles • The job description (XML file) used for the current alignment run • An alignment list per iteration storing the alignment job parameters for each particle including the results. You can convert any AlignmentList file to a particle list by running the terminal and typing ipytom from pytom.basic.structures import ParticleList pl = ParticleList() pl.fromAlignmentList('AlignmentList-1.xml') pl.toXMLFile('alignedParticles.xml') 
• Information about the current alignment progress from iteration to iteration in html format

## Alignment of the tutorial data

Alignment of the previously localized and roughly classified particles is shown in the alignment directory. Here, you will find
• the list of all particles allParticles.xml with their rough orientations determined by localization
• a script createParticleList.py showing you how to create allParticles.xml from the previous MCO-EXMX classification step
• job.xml is a example job XML showing how a ready alignment job should look like and job.sh is the executeable with the respective call
• reference.em and mask.em are respective calls to the alignment reference and alignment mask
• Alignment results will be written into the results directory (Please make sure it exists.)
You can go ahead and create your own job files with the tutorial data or modify copies of the job.xml yourself to see differences in results and get a feeling for parameters.

## Further modifications

For further runs with modified parameters you can either load the job.xml file into the user interface, or modify the XML file on disk. It's really easy to set up a new job for the same particles. You can modify all parameters by using a standard text editor. However, remember to update the destination where results will be written to. You might overwrite old results otherwise.

## Setting up an alignment job using the web server

Using the Pytomserver all input files for running the alingment can be created.