PyTom: Localize Macromolecules by template matching

Overview

Features in a tomogram that resemble a structural 'template' can be localized in an automated fashion using 'template matching'. In this approach a 3D template is correlated with a given tomogram. In this procedure the different possible rotations and translations are sampled exhaustively using the algorithm described in Förster et al, Meth. Enzymol. 483:215-43 (2010).

Tutorial files

You will find sample localization scripts in the tutorial repository in the RibosFromLysate/localization directory. It contains symbolic file-links to the tomogram, reference and mask + a submit example. The referenced tomogram must be reconstructed first because it is too big for download! Please note that the submit file starting the localization is specifically designed for our cluster. You will find a more generic openmpi command in the following paragraphs. Scripts for determining potential macromolecules (extract.sh) and to fit a Gaussian into the score histogram get an estimate of the number of macromolecules (plotFit.sh) are available,too. Do not hesitate to modify the scripts after you processed them at least once. You can always fall back to the original script by typing

git checkout -- TheFileYouChanged

Detect putative particles using localization.py

The script bin/localization.py allows computing the constrained local correlation of a tomogram and a reference in a local area described by a mask. The script supports MPI, which allows running the script on large computer clusters. Here is a call of the script:

mpirun --hostfile "pathToYourHostfile" -c "numberOfCPUs" pytom localization.py job.xml 2 2 2

The last 2 2 2 specify that the tomogram will be split into 2 parts along x,y,z dimension. => 8 subcubes are distributed during localization and merged when finished.This splitting of the volume does not change the results. However, please keep in mind that splitting a tomogram along either one dimension resulting in a volume size smaller than the reference size will fail! All parameters that influence the result are specified in the XML file job.xml . In detail, the XML file specifies:

This is a sample XML file specifying the localization job:

<JobDescription Destination="./results/">
<Volume Filename="./tomogram.em"/>
<Reference Weighting="" File="./reference.em"/>
<Mask Filename="./mask.em" Binning="1" isSphere="True"/>
<WedgeInfo Angle1="30" Angle2="30" CutoffRadius="0.0" TiltAxis="custom">
<Rotation Z1="0.0" Z2="0.0" X="0.0"/>
</WedgeInfo>
<Angles Type="FromEMFile" File="angles_12.85_7112.em">
<RefinementParameters Shells="6.0" Increment="10.0"/>
<OldRotations/>
</Angles>
<Score Type="FLCFScore" Value="-100000000">
<DistanceFunction Deviation="0.0" Mean="0.0" Filename=""/>
</Score>
</JobDescription>

Localization job with the UI

 

Create a Localization job in the terminal with localizationJob.py

The localizationJob.py script allows you to set up localization through the terminal instead using the web-browser.

NAME
    localizationJob.py
DESCRIPTION
    Create a localization job.
OPTIONS
    -v, --volume		Volume : the big volume (Is optional: No; Requires arguments: Yes)
    -r, --reference		Reference : the molecule searched (Is optional: No; Requires arguments: Yes)
    -m, --mask			Mask : a mask  (Is optional: No; Requires arguments: Yes)
    --wedge1    		Wedge : first tilt angle. Must be 90-tilt! (Is optional: No; Requires arguments: Yes)
    --wedge2    		Wedge : second tilt angle.  Must be 90-tilt! (Is optional: No; Requires arguments: Yes)
    -a, --angles		Angles : name of angle list. Either : 
                                    angles_50_100.em
                                    angles_38.53_256.em
                                    angles_35.76_320.em
                                    angles_25.25_980.em
                                    angles_19.95_1944.em
                                    angles_18_3040.em    
                                    angles_12.85_7112.em    
                                    angles_11_15192.em    
                                    angles_07_45123.em
                                    angles_3_553680.em
                                     (Is optional: No; Requires arguments: Yes)
    -d, --destination	Destination : destination directory (Is optional: No; Requires arguments: Yes)
    -b, --band    		Lowpass filter : band - in pixels (Is optional: No; Requires arguments: Yes)
    --splitX    		Into how many parts do you want to split volume (X dimension) (Is optional: No; Requires arguments: Yes)
    --splitY    		Into how many parts do you want to split volume (Y dimension) (Is optional: No; Requires arguments: Yes)
    --splitZ    		Into how many parts do you want to split volume (Z dimension) (Is optional: No; Requires arguments: Yes)
    -j, --jobName		Specify job.xml filename (Is optional: No; Requires arguments: Yes)
    -h, --help    		Help. (Is optional: Yes; Requires arguments: No)
  			

The result of this script will be a job.xml and a job.sh file

Extracting positions and oriantations of candidates using extractCandidates.py

The correlation volume and corresponding orientations generated above need to be interpreted. The script bin/extractCandidates.py simply determines the peaks of the correlation volume and the corresponding orientations that are all stored in a particle list (xml file).

pytom "PathToPyTom"/bin/extractCandidates.py -j job.xml -n 100 -s 6 -r scores.em -o angles.em -p pl.xml -m -t pathToNewParticles

For usage, you need to specify the correlation (score) volume where peaks are located -r score.em and volume with best angle indexes -o angles.em . That will generate a particle list -p pl.xml (and a MOTL pl.xml_MOTL.em if -m is specified) in the current directory. The particle list will contain -n 100 particles. The cut out radius around each peak is -s 6 . In the resulting particle list All particles will have a prefix determined by the -t option. Please note that you can include the complete (absolute) path to the particles, here, too. -g indicates the minimum distance from the edges, which will not be concidered as potential candidates.

Estimate the number of particles in tomogram

In order to estimate the approximate number of molecules in the tomogram statistically, run plotGaussianFit.py. A Gaussian will be fitted into the histogram of the score-values. Everything lower than 1 sigma from the fitted mean should be regarded as not significant hits.

NAME
plotGaussianFit.py
DESCRIPTION
    Do the Gaussian fitting on the found particle list.
OPTIONS
    -f, --file    Particle list after extracting candidates. (Is optional: No; Requires arguments: Yes)
    -n, --numberBins    Number of bins of histogram. Default is 10. (Is optional: Yes; Requires arguments: Yes)
    -p, --gaussianPeak    The correspondent index of the Gaussian peak. (Is optional: No; Requires arguments: Yes)
    -c, --nuberParticles    Number of particles up to CCC value. (Is optional: No; Requires arguments: Yes)
    -h, --help    Help. (Is optional: Yes; Requires arguments: No)
AUTHORS
    Yuxiang Chen 
    		


The above plot showcases a fitted Gaussian (dashed green line) in the histogram (solid red line) of all scores determined during localization.

To adjust your particle list to the determined score value, simply erase all particles with a value lower than the estimate. You can achieve that by either manually deleting all particles with a score lower than the estimate from the XML file using an text editor, or in the ipytom terminal with the following commands:

from pytom.basic.structures import ParticleList
pl = ParticleList()
newPl = pl[0:252]
newPL.toXMLFile('pl_first252.xml')