DD.. _tutorial:

Tutorial: 1,3-Disilacyclopentane

Introduction

For the purposes of a guided tutorial to using the GUI-less, standalone Autofit, in this case prog_A_v15.py, we will use some previously published data taken at the Pate lab to illustrate the techniques and capabilities of Autofit.

Specifically, we will analyze the spectrum of 1,3-disilacyclopentane to find the parent species and its silicon and carbon isotopologues using Autofit. This molecule makes an excellent candidate for learning Autofit, as the parent species spectrum is sufficiently intense and the number of triples for Autofit to check due to spectral density is relatively low. In addition, the ab initio predictions are fairly accurate, allowing for small search windows and thus quick searches.

If you would like to read specifics regarding this molecule, a paper has been published this year regarding the FT microwave data presented here, as well as IR data taken by the Durig group at UKMC.

Citation: G. A. Guirgis, J. J. Klaassen, B. H. Pate, N. A. Seifert, I. D. Darkhalil, B. S. Deodhar, J. K. Wyatt, H. W. Dukes, M. Kruger, J. R. Durig; J. Mol. Struct. 1049, 400-408 (2013).

Required Stuff

For the purposes of following along with this tutorial, you will need these supplementary files:

  • For this exercise, we will use prog_A_vX.py, where X is at least 15. If you don’t have this, go back to the Getting Started with Autofit section and follow those instructions.
  • Experimental data; this data was taken in 2012 in the Pate lab using the 6-18 GHz spectrometer at UVa. More information can be found in the paper cited above.
  • Optimized geometry; this geometry was optimized in GAUSSIAN 09 using the MP2/6-311++g(d,p) geometry. It is in the XYZ format, which is standard for any Autofit geometry input. This geometry is required for scaling rotational constants for the purpose of finding isotopologues.

Step 1: Input

Start Autofit by executing the script inside of your bash shell:

python prog_A_v15.py

If all loads properly, you should get a dialog box that looks like this:

_images/fig1.png

Notice there are two choices. The first, entitled Normal Species, is the Autofit routine for the purpose of finding a species or a set of species using a single autofit scan. You can kind of consider this “Classic” mode. The other is for when you have an experimentally fit parent species and you want autofit to find its isotopologues. In reality, if you run a normal species autofit session, the program will ask you at the end if you want to search for isotopologues for a given species, so we’ll just go ahead and click Normal Species.

Now, go back to the bash shell. You should get a message that says:

Would you like to use an input file? (1/0)

During an Autofit run (later on in the setup stage), the program will generate an input file. If you have an input file, say from a botched run from earlier that you want to retry, hit 1. For the purposes of this tutorial, type 0 and hit ENTER.

Next, a dialog box will appear asking you to enter a job name. Give it a unique name, as Autofit will create a folder with this job name and will not work unless the folder is unique. If you want to use an already used job name, close out of the Python script and delete the folder and rerun the script. It should let you use that job name again.

Next, the primary input screen will appear. Most of these are self-explanatory, but below you can find an image with all the necessary things filled out. The values in this image are taken from the ab initio results given from a single point calculation of the geometry downloadable above.

_images/fig2.png

Some notes about this image:

  • The values in parenthesis next to the row descriptors (e.g. the (9769.6221 MHz) next to A) are default values. If you leave the entry box for this parameter empty, the program will default to it. Currently the program is defaulted to a particular conformer of hexanal, so make sure to fill in everything!
  • As stated in Introduction to Autofit, distortion is treated as a static set of parameters, so you can feel free to add predicted distortion constants from a frequency calculation, or just set them to 0 (typical practice is just to set them to 0, but of course it depends on your confidence in the predicted quantities)
  • Dipole values are only used to rank transitions based on intensity (coupled with temperature) so if you have a molecule with one experimentally-detectable dipole direction, then it will work fine to set the dipole to 1 and the others to 0.
  • The number of processors entry tells the system how many cores you want to run your search on. This count includes hyperthreading, so for example if you have a 4-core Intel Core i7 series CPU with hyperthreading, you can set this from 1-8.
  • The upper intensity limit for a full spectrum scan doesn’t matter too much (it might be important when you have a dense spectrum and you want to constrain your peakpick), so I set it to default. However, it is important to fix a good lower intensity limit, so Autofit doesn’t end up fitting noise. A good setting for this spectrum is 1 microvolt, or 0.001.

After everything is filled out and you hit OK, the program will now ask you to import a spectrum file. Point it to the experimental spectrum downloadable above, and hit OK. Autofit will seem like it’s not doing anything, but in fact it is generating a peakpick file from the data automatically, using the intensity bounds you input in the previous screen. This peakpick is done using a cubic spline, which gives a frequency resolution better than the point spacing of the FT. Splining the data will tend to improve your overall RMS errors on fit species.

Step 2: The Art of Picking Triples

2A: Picking a Set of Fit & Checked Transitions

Once it finishes splining, another box will popup that looks like this:

_images/fig3.png

If you choose Automatic scoring, Autofit will sort through the most intense transitions and rankk sets of triples of these transitions by their linear independence in fitting A, B & C. This can be advantageous in a lot of instances, especially with dense spectra where the number of available and resolvable transitions is high. However, for the purposes of this tutorial, we will choose Manual selection in order to illustrate picking a good triple.

Once you select Manual Selection, you should see a list of transitions that looks like this:

_images/fig4.png

Each transition has its own entry, and the format of each line is as follows:

(-log10(intensity), frequency, J' Ka' Kc', J'' Ka'' Kc'', uncertainty)

Now, for the purposes of picking a triple, you can try different choices at this point. Notice that for this molecule the spectrum is purely B-type, so in general the transition dependence on A, B and C is pretty good. But consider the case where you have a pure a-type spectrum. Here, the dependence on B & C is almost exclusive, especially at low J – the dependence of the transition frequencies on A is quite poor. However, for a good fit, this kind of behavior in the Autofit results file as a fit with a set of well-determined B & C values and an A value that is inherently high in error, so the A value is not trustworthy at this point. That being said, in the case of a pure A-type spectrum, it will be best to pick a set of transitions that maximize our potential determination of A, e.g. choosing a transition with a Ka > 1.

Back to disilacyclopentane, in the world of b-type transitions: If you’re not feeling particularly creative, you can try this set of transitions:

trans_1: ('-6.0732', '13738.2722', ' 3 1 3', ' 2 0 2', '130.1830')

trans_2: ('-6.2248', '12356.1686', ' 3 0 3', ' 2 1 2', '112.8215')

trans_3: ('-6.3855', '10210.2399', ' 2 1 2', ' 1 0 1', '105.7633')

After you hit OK, there should be a new dialog box that outputs the linear independence of the three chosen transitions. Usually you want the magnitude of this number to be as high as possible, but for a low-J b-type spectrum like this we probably won’t be able to do much better than a linear dependence of 10 or so. For the choices above, you should get a dependence of around 6.25 (absolute value) – this will be fine for the purposes of this tutorial. Hit Continue, or you can choose to pick another set of three peaks and see how the dependence value changes as you alter the pick list.

Next, you will get a popup that allows you to choose how you want to determine your search windows:

_images/fig5.png

The choices here are rather straightforward:

  • User-defined, same for each: Autofit will check experimental transitions within a window of +/- N MHz for each of the three transitions, where N is the window value you set. This is useful for spectral searches with only one dipole component, where all the transitions are roughly dependent on A, B & C in equal magnitudes
  • User defined, different for each: Autofit will check experimental transitions within different sized, user-inputted search windows for each of the three transitions. This is useful when you are trying to fit mix spectra, such as if you are confident you are close on B & C for a set of A-type transitions, but unclear on A and want to use a B-type transition to fit A.
  • Three times SPCAT uncertainty: Brute-force method. It will take the uncertainties for each line that SPCAT outputs and multiply them by three. There’s generally not any good reason to do this, but it’s there in case you need a brute force method of picking a window.

For the purposes of this tutorial (assuming you picked the set of lines we’ve listed above), we will go ahead and set a singular window for all three transitions. Notice that their uncertainties are pretty close (130 MHz, 112 MHz and 105 MHz respectively), so we can set it to something static and be pretty confident we’re doing a fair search. At this point, you can enter any value you please. The spectrum is sufficiently sparse that even a ridiculously large window, such as 1 GHz, will result in a short calculation. This tutorial will use a window of 200 MHz since the ab initio predictions are actually quite good, but you will get similar results no matter what you choose.

Once you choose a window (150 MHz should give you about 3000 triples to check), the program will now pop up the following dialog box:

_images/fig6.png

Here, Autofit is asking how you want to choose your set of check transitions. Basically, Autofit will fit A, B and C to the triple you gave it earlier, and then will go ahead and check the goodness of fit of these “3-line fit” constants and score them based on how many of the chosen additional transitions are within a rejection tolerance of the predictions. It is advantageous to pick a set of additional transitions you are confident will also be resolvable in the spectrum. For instance, if you chose a set of low Ka A/B-type transitions, it would probably be a good idea to include similar transitions of different Js for this additional set. If you pick a number of transitions that are too weak to detect, then Autofit will drop that fit’s score and sometimes it will throw the overall RMS error of the fit off. That being said, typically if you get a good A, B & C from the fit triple you selected, it won’t matter too much what the additional set contains.

For the purposes of this tutorial, you can again choose whatever you please (using the Top 10 should be fine), or you can choose this set of additional transitions:

('-6.0900', '15210.9317', ' 2 2 1', ' 1 1 0')
('-6.1355', '16453.0888', ' 2 2 0', ' 1 1 1')
('-6.8785', ' 7699.1834', ' 2 0 2', ' 1 1 1')
('-6.9041', ' 6355.2929', ' 1 1 1', ' 0 0 0')

Once these are selected and you hit OK, the program will start. You can tell this by going back to the terminal, and the last line written should be the # of triples (e.g. 2916 if you chose all the default choices above, with a window of 200 MHz).

2B: Analyzing the Results

When the program is complete, Autofit will pop up a new window with the results that looks like this:

_images/fig7.png

We will now zoom into the top three results (they are the only fits below 100 kHz RMS error) and describe the organization of the results:

00 score =  04 Const = 4411.063 2817.562 1905.547 average omc = 0.0371415709585  avg w/out peaks over edge = 0.0371415709585 avg w/ inten penalty = 0.111424712876
01 score =  04 Const = 4414.226 2851.558 1921.563 average omc = 0.0415715716715  avg w/out peaks over edge = 0.0415715716715 avg w/ inten penalty = 0.124714715015
02 score =  04 Const = 4417.685 2887.059 1938.21 average omc = 0.0433490722999  avg w/out peaks over edge = 0.0433490722999 avg w/ inten penalty = 0.1300472169

The score is an internal count of how many of the additional transitions you picked were fit and not rejected due to high observed-minus-calculated (OMC) errors. Since the three fits above have scores of 4, this means all 4 of the additional transitions were accepted into the final fits. Therefore, the first fit has seven lines for a total error of 37 kHz RMS error.

Notice there are three errors listed; currently (as of v15) the program sorts the final output by the OMC error with intensity penalty (the far right entry). This error accounts for differences in relative intensity versus predicted intensities and the experimental intensities. For instance, a triple that fits three experimental lines that are predicted to be roughly equal intensity, but in the spectrum are vastly different in intensity (say larger than a factor of two or three) will end up having a larger RMS error with the intensity penalty enabled. This functionality is still a bit experimental, so sometimes you will find fits that have good OMCs without the penalty half-way down the list since the intensity penalty increases the RMS error considerably. However, in general the best fits by average OMC will be near the top, despite whether they have high OMCs with the intensity penalty or not.

Returning to these fits, it is unclear which fit corresponds to the parent species of 1,3-disilacyclopentane, since all three have decent RMS errors. A quick comparison of these fit constants to the experimental results will quickly reveal that the first two in the list above correspond to isotopologues, and the third is the parent species. You can also tell this by the fact that the parent species has the largest rotational constants (corresponding to the smaller mass relative to the heavier isotopologues).

2C: Refining a Fit

When you hit OK, the program will now ask you if you want to refine any of the results by allowing distortion to vary. To illustrate this, let’s give it a shot on our brand new 1,3-disilacyclopentane experimental constants. Hit “Yes!” and then select fit #3 (02) from the list. After a few applications of the OK button, the program will then ask you if you want to add additional transitions into the fit. If you’d like to add a few transitions, go ahead – otherwise continue to the next screen.

The program will then ask you which distortion you want to float. You can try as many as you’d like, but it should be sufficient just to fit DJ and DJK. After you select the appropriate distortion constants (the program defaults to the Watson A-reduction!), it will ask you to input an OMC threshold. Basically, if any of the lines that you ask it to fit have an OMC greater than this threshold, then SPFIT won’t fit them. 100 kHz (0.1 MHz) should be fine.

Once you hit OK, it will then output the SPFIT results for the input fit. Fitting just DJ and adding one additional transition to the fit (the 414-303 transition), the 43 kHz fit above reduces to just 3.1 kHz. Not bad!

If you aren’t satisfied with this fit, hit OK and the program will then ask you if you want to try to add some additional transitions / remove distortion, and you can continue through the above process as many times as you’d like.

NOTE: If you want to work manually on these SPFIT results, Autofit automatically writes the input files to the job directory as refitX.par/fit/etc. Once you are happy with the results and continue on, it will move them automatically to the <job name>/refits folder.

Step 3: Fitting Isotopologues

3A: Search Setup

Now that we have the parent species in the bag, it is time to shift our focus to finding the isotopologues. After you finished refining your normal species fit, the program should pop up the dialog box asking:

Would you like to perform a search for singly-substituted isotopologues?

If you’re interesting in doing so, hit Yes. You will then see the input dialog box from earlier pop up. In this box, enter in the experimental rotational constants (and maybe even the new distortion you fit in the previous section) for 1,3-disilacyclopentane here, not the ab initio constants. It might also be prudent to lower the lower intensity cutoff slightly, from 1 microvolt (0.001) to just above another factor of two in dynamic range. Looking at the average noise level in the experiment, a lower cutoff of approximately 0.00065 mV (650 nV) should be sufficient.

When you hit OK, you will be presented with another dialog box:

_images/fig8.png

This might seem a little confusing at first, but it’s not once you get the hang of the idea and it’s actually quite useful. What this input does, in essence, is set a cutoff for what transitions will be selectable for a fit triple for the isotopologue search based on the experimental intensities for lines in the spectrum. For instance, if we set this number to N, then only lines with predicted intensity (set by fixing a conversion factor between predicted SPCAT intensities and experimental intensities) stronger than N times the lower intensity limit (in this case, 650 nV), will be selectable. Therefore, if we wanted to search for 13Cs, we will want to search for lines that are at least, say, 300:1 in the parent species (if we assume 650 nV is our noise floor, and we want to find isotopologue transitions that are at least 3:1, and 13C isotopologues are 1% intensity of the parent species). Therefore, you will want to input a factor of at least 300. Since we are dealing purely with 13C and 29Si/30Si (which are around 3-5% of the parent species), a factor of 100 will be safe and sufficient.

NOTE: As it stands in v15, this routine of intensity thresholding is not perfect and is still experimental. You can be 100% safe by setting a threshold of 1 and then figure out manually what would be good transitions to choose for isotopologue searches, but setting a factor of 50 or so is often just as safe. Choosing a high tolerance will not remove your ability to find isotopologues, but you will need to make sure you aren’t choosing transitions that are not resolvable for the isotopologues.

If you pick 100, you will see only 5 lines are remaining to choose from. It is likely there are more transitions that meet this criteria, but as the NOTE above says it isn’t always perfect. However, these 5 lines are sufficient for the purposes of this exercise, so go ahead and pick three transitions to fit (shouldn’t matter which you choose).

The program will now ask you to input the coordinates file for your geometry – point it to the geometry downloadable at the beginning of this tutorial. It will use this geometry to scale predicted isotopologue rotational constants and improve the precision of the guess constants.

Once it loads, Autofit will then ask you what atoms to search for. Choose carbon and silicon. Note that in this molecule, there is a plane of symmetry down the middle of the ring – so both silicon positions are equivalent, as are the two methylene groups attached to each other. Autofit searches by atom, so these searches will be doubled but should give exactly the same results.

For the search window, a small window will do fine – maybe even less than the ~50 MHz it suggests. For the purpose of this tutorial, I put 25 MHz. For the checked transition list, go ahead and select the entire list (there will be duplicates with the fit triple, but the program will remove these duplicates automatically).

Once that is completed, Autofit will begin its search for each isotopologue. It will go through each atom before it outputs any results, so you can watch the progress in the terminal (it will print the # of triples for each isotopologue run).

3B: Analysis

Once all the searches are done, it will sequentially show you the outputs for each isotopologue search. For instance, the top result is for 13C1 (the labelling is based on the ordering in the geometry file). The top fit is:

00 score =  02 Const = 4343.347 2877.062 1920.859 average omc = 0.00173015558175  avg w/out peaks over edge = 0.00173015558175 avg w/ inten penalty = 0.00346031116351

Comparing this to the literature results for this molecule, this fit corresponds to the 13C(beta) position.

NOTE: If you want to take a look at these results separately, Autofit creates new folders in the /<job name> root for each of the isotopologues. You can find files entitled best100_...txt that are the summarized outputs that are printed on screen by Autofit.

If this is ok, hit the OK button. Autofit will ask you if you want to refine the fit; no need, so hit “No, I’m done with this isotopologue!” and the next isotopologue search will pop up. This search is actually for the carbon symmetric to the first search, so the top result will be identical to the one above.

For the remaining autofit searches, the top fits found using the settings above are as follows:

Search #3 (13C-3): 00 score =  02 Const = 4337.47 2887.119 1922.694 average omc = 0.0118402122775  avg w/out peaks over edge = 0.0118402122775 avg w/ inten penalty = 0.0118402122775
Search #4 (29Si-1): 00 score =  02 Const = 4414.215 2851.553 1921.569 average omc = 0.00246013645665  avg w/out peaks over edge = 0.00246013645665 avg w/ inten penalty = 0.0049202729133
Search #5 (30Si-1): 00 score =  02 Const = 4411.051 2817.557 1905.553 average omc = 0.00442512654354  avg w/out peaks over edge = 0.00442512654354 avg w/ inten penalty = 0.0132753796306
Search #6 (29Si-2): Identical to Fit #4
Search #7 (30Si-2): Identical to Fit #5

Comparing these results to the literature values, they match up exactly with the isotopologues assigned in the published work. Just by pushing a few buttons and waiting a couple of minutes, we have assigned not only the parent species of 1,3-disilacyclopentane, but all of its heavy atom isotopologues!

Closing Comments & Tips

The techniques taught above will apply to effectively any problem you throw at Autofit. However, not everything is as simple and easy to discover as 1,3-disilacyclopentane. Denser spectra and weaker species, especially those in a mixture, can often be difficult to pin down quickly with Autofit. However, as any graduate student or postdoctoral researcher in the Pate group from the last two years or so can tell you that Autofit is essential for dealing with these kinds of broadband spectra efficiently. At times, fitting by hand can’t be beaten, especially when visual patterns are easy to ascertain. But when dealing with weak species in spectra with 5000-10000+ lines, it reaches the point where most of the success in fitting by hand arises from pure luck.

If you feel comfortable with the program, but still can’t get good results with your own projects, check out the tips in the Tips section.