Author: Henry Rzepa

A ROR Persistent Identifier for the WATOC organisation – helping to make scientific connections.
Science frequently works by people making connections between related (or even apparently unrelated) concepts or data. There are many ways of helping people make these connections – attending a conference or seminar, searching journals for published articles and nowadays also searching for data are just a few examples. For about 20 years now, one technology which has been helping to enable such discoveries is what are called “Persistent IDentifiers” or PIDs. These are unique labels which can be attached to a (scientific) object such as a journal article, a dataset or a researcher. The PIDs for the first two examples have become better known as DOIs (digital object identifier), the last is known as an ORCID. The PID is registered with a registration authority. Two of the oldest and best known authorities are CrossRef for journal articles, funders (etc) and DataCite, who specialise in citable identifiers for data. The registration process includes creating and adding a metadata record to the PID, the record is then indexed and can then be used for searching for the objects. The terms of these metadata records are carefully controlled to use specified and standardised vocabularies to describe the objects (one current initiative in chemistry in this area is described here[cite]10.1515/pac-2021-2009[/cite]).

The PID “ecosystem” is constantly expanding and a recent addition is the ROR registration authority. This issues PIDs for research organisations, so that one can then easily associate a scientific object with the organisation where the research was conducted. The initial focus for ROR PIDs was the traditional forms of organisation such as a university and company research labs. Here I tell about how a rather different type of organisation came to have its own ROR, the “World Association of Theoretical and Computational Chemists” or WATOC. The aims of WATOC are primarily to hold triennial congresses to promote scientific exchange and to help researchers make those connections through presentations, posters and numerous coffee breaks!

Last July, the proposal for creating a ROR for WATOC was accepted by its decision making body and can now be announced as https://ror.org/04rp40h82, where 04rp40h82 is the unique WATOC identifier. The prefix https://ror.org/ is called the “resolver”, which in turn allows access to the associated metadata record via an API. That record in turn includes a link to the organisation, similar to links to journal articles as specified by a DOI.

It is now time to show some examples of how the WATOC ROR can actually be used.
1. One outcome of the last WATOC Congress held in 2022 in Vancouver is the production of a themed peer-reviewed issue of the Canadian journal of chemistry, created by inviting speakers to submit an article corresponding to their presentation. Armed with the WATOC ROR, the publisher was approached to ask if this identifier could be included in the metadata record for each accepted article. This was agreed and in due course will be added to the Crossref metadata record for each article in this special issue. When this happens, it can be searched using e.g. https://api.crossref.org/works?filter=ror-id:04rp40h82 Because creation of a metadata record is actually part of the complex journal production workflow, this will not occur until the journal has updated its procedures to do this, which may take a little while yet. Invoking that search would then allow all published articles associated with (at least in part) WATOC activities.
2. The link https://api.crossref.org/works?filter=ror-id:04rp40h82 is actually part of the CrossRef API (application programmer interface) and so can now be used to construct complex programatic queries which include the WATOC ROR and for deployment in e.g. AI applications.[cite]10.1145/3447772[/cite] Although not derived from the CrossRef API, I can show here some similar uses of metadata for the construction of so-called Knowledge Graphs [cite]10.1145/3447772[/cite], which can be thought of as visual representation of connections between scientific objects, organisations and other types of entity to which a registered PID has been assigned.
  1. This knowledge graph was created using SciFinder by specifying a person (myself in this case) and any conferences they have been associated with. However, in the past the capture of conference attendance was a rather hit and miss process and so the record is very incomplete. It is the expectation that metadata associated with ROR PIDs will help make these records more complete and hence useful. ROR is also fully open and hence its use is less restricted than the proprietary SciFinder system.
  2. I cannot resist also adding this one. The metadata record now contains named concepts, this one being “transition states” which I have been associated with in the past.
  3. As of today, the WATOC ROR has not propagated to any CrossRef metadata records and so I cannot yet show any knowledge graphs with nodes based on WATOC.
3. The ROR PID can also be used for inclusion in metadata records describing datasets. This is one such search, now of the DataCite metadata store:
  https://commons.datacite.org/doi.org?query=((contributors.affiliation.affiliationIdentifier:*04rp40h82)+AND+(contributors.affiliation.affiliationIdentifierScheme:ROR))+OR+((creators.affiliation.affiliationIdentifier:*04rp40h82)+AND+(creators.affiliation.affiliationIdentifierScheme:ROR))
  Note the somewhat more complex logic being used, in part because a dataset can be “created” by a named person but also can be “contributed to” and one should really search for both possibilities.
4. One can also combine two different identifiers, namely an organisational ROR and a researcher ORCID into a single query:
  https://commons.datacite.org/?query=((creators.affiliation.affiliationIdentifier:*04rp40h82)+OR+(contributors.affiliation.affiliationIdentifier:*04rp40h82))+AND+(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)
  There are many more combinations of searches that can be constructed using other types of identifiers.[cite]10.1002/mrc.5186[/cite]
5. Further in the future, one might expect that metadata records from e.g. both CrossRef and DataCite could be combined to create knowledge graphs by combining information based on both journal articles and published FAIR datasets. Currently, CrossRef does not identify PIDs for datasets that might be cited in an article bibliography as explicit data, but that too may be coming in the near future.[cite]10.3789/niso-rp-36-2020[/cite]
Way back in January 1994, WATOC was one of the very first chemical-science based organisations to have its own web page. Now it is leading the way in acquiring and deploying its very own persistent identifier in the form of a ROR. One might hope that many more such organisations acquire one soon.

The DOI for this post is 10.14469/hpc/12363
March 9, 2023
Determining absolute configuration: Cylindricine.
Nature has produced most natural molecules as chiral objects, which means the molecule can come in two enantiomeric forms, each being the mirror image of the other. When a natural product is synthesised in a laboratory, a chiral synthesis means just one form is made, and then is compared with the natural product to see if it matches. Just such a process was following in the recent synthesis of cylindricine, a marine alkaloid[cite]10.1021/acs.orglett.2c02004[/cite] featured on the ACS molecule-of-the-week site. The authors noted that the absolute configuration of cylindricine as isolated naturally had remained unassigned, and as it happens one way of measuring the properties of the individual enantiomer – its optical rotation – had not been determined. So in part, the purpose of this synthesis was to determine the absolute configuration of this molecule. Here I explore this process.

There are several different procedures for finding the absolute configuration of a molecule.
1. By synthesis from a starting material, itself presumed of known absolute configuration – in this example, a molecule[cite]10.1002/chem.200600420[/cite] which had been previously assigned an absolute configuration. The presumption then is that all the transformations made to this molecule have stereochemically certain and predictable outcomes and of course that the configuration of the starting material in this process was not in any doubt. Ultimately, this chain of inferences should be traceable all the way back to D-(+)-glyceraldehyde. These inference chains can involve multiple groups working at different times.
2. Alternative methods can be used as an independent check on the first method above, which depend only on the properties of the target molecule itself and not on any inference chain. One such is the “gold standard”, introduced in 1951[cite]10.1038/168271a0[/cite] and using X-ray crystallography. This method is quite common nowadays, but it does require a suitable crystal for measurement.
3. The so-called chiroptical properties of the target molecule can also be subjected to computational prediction, a method first introduced in 1937[cite]10.1063/1.1750060[/cite] using the optical rotation as the measure and based on linearly polarized light at a specific wavelength (normally corresponding ot 589nm). As was found in 1937, this can be quite a fragile method, depending very much on the actual conformations of the molecule. Rigid molecules are more predictable than flexible ones. Cylindricine itself has a number of conformations or orientations of the various substituents and it then becomes an question of finding the most stable of these, in terms of their overall contributing populations.
4. A more recent method is the use of a technique known Electronic circular dichroism (ECD), which uses circularly rather than linearly polarised light, across a range of wavelengths from ~200 up to ~800nm.
5. An even more recent chiroptical method is VCD or Vibrational Circular Dichroism. This spectroscopic technique detects differences in attenuation of left and right circularly polarized light passing through a sample. It is an extension of circular dichroism spectroscopy into the infrared ranges.
Any or all of methods 3-5 could be used to independently check on the results inferred in procedure 1. Here I report the results of such an attempted verification.

The start point is an attempt to find the most stable conformation of cylindricine. Here I am using a conformational tool called GMMX, part of the Gaussview suite. Loading the molecule as drawn above, six rotatable bonds are automatically identified and the program systematically rotates about all of these in turn using a molecular mechanics force field to compute an energy. This field includes so-called dispersion or van der Waals attractions. I used the MMFF94 force field, with its origins in the pharmaceutical industries and reasonably suitable for a natural product. The lowest energy conformation obtained is shown below, but it should be noted that there are 36 further conformations within 3.5 kcal/mol of the lowest. This conformer was chosen for the chiroptical calculations described in 3-5 above. Of course, more thoroughly all the conformers with a population of at least 1% should be included in this process for a more comprehensive analysis.

To get an inkling of why this conformer might be the lowest in energy, inspect the model below (click on the image to get a 3D rotatable model). It shows the so-called NCI (non-covalent-interactions), which are mostly composed of hydrogen bond and dispersion stabilisations. Each little blue/green isosurface is one of these – and the more of them there are – the more stable the conformer.

For this conformer, the calculated optical rotation emerge as -34° at 589nm (FAIR Data DOI: 10.14469/hpc/12231). The reported value is -8.5°. You might think that the agreement is poor, but such calculations are only reasonably clear-cut for large values of the rotations! Clearly, this calculation provides some supporting evidence that the assignment of absolute configuration is correct. The take home message is not the value of the rotation but its sign, where calculation and measurement agree. The next step would be to perform a full conformational average over all 37 conformations!

The calculated ECD spectrum is shown below. It only shows a weak negative feature at ~220nm and strong evidence requires features at >280nm to be clear cut. This result suggests that recording this spectrum is not recommended.

The VCD spectrum is shown below. This does show strong features in both the C-H stretching region and the 1500-800 wavenumber region and would be a good diagnostic. Recording it would indirectly also reveal whether the conformer chosen above is likely to be correct or not.

So the above provides a start point for a more comprehensive and independent method for verifying the absolute configuration. The total synthesis using a starting material of known configuration it has to be said is normally pretty reliable, but there are rare examples where a mistake in assignment was made of such a precursor and which was indeed corrected by VCD assignment.[cite]10.1039/b913295c[/cite]

This blog has DOI: 10.14469/hpc/12233
February 1, 2023
A look at (one of) the dyes used in the Bayeaux tapestry.

I have previously looked at the pigments used to colour the Book of Kells, which dates from around 800 AD and which contained arsenic sulfide as the yellow colourant. The Bayeaux tapestry is a later embroidery dating probably from around 1077 and here the colours are based entirely on mordanted natural dyes. These are generally acknowledged to be blue woad (principle component indigo), red madder (principle component alizarin) and the less well-known yellow weld, which comes from the plant Reseda Luteola and the principle component of which is luteolin.[cite]10.1016/j.dyepig.2022.110798[/cite]

Luteolin has an interesting chemical history. It was first purified in 1829, in the dawn of organic chemistry, and its formula C₁₅H₁₀O₆established by 1864. A. G. Perkin, the son of the William Perkin who discovered the dye mauveine, then provided the chemical structure[cite]10.1039/CT8966900799[/cite] in 1896. This latter article is well worth a modern read, since it beautifully illustrates how the art of structure determination was conducted in the days before crystallography and NMR.

Perkin obtains his structure by comparing luteolin to then known quercetin, concluding that the former must also contain an aromatic hydroxy group “ortho” to the carbonyl group, as in querecetin. The key experimental evidence was that alkylation of luteolin with iodoethane only produces a triethoxy derivative of luteolin, with “one hydroxy group resisting ethylation“. It was by then established, by four different sets of researchers, that hydroxy groups adjacent to the carbonyl in e.g. quercetin or alizarin resisted alkylation. The structure of luteolin was established (see eg 10.5517/cc798yq) by combining various such observations, a method (and skill) that has largely lapsed nowadays.

A modern take on this selective alkylation might be to compute e.g. the wavefunction (ωB97XD/Def2-TZVPP/SCRF=water) of luteolin to inspect the energies of the orbitals associated with alkylation of the hydroxyl group, using the energy of the nucleophilic lone pair oxygen orbital (FAIR DOI: 10.14469/hpc/12185) as an indicator. The least stable such orbital (highest energy) is normally an indicator of the most nucleophilic electron pair. In this case, the highest (most reactive) such orbital is the one adjacent to the carbonyl group, which thereby reveals a mystery, since it is this very hydroxyl that resists alkylation! A transition state approach to this might be needed to resolve the mystery, factoring in perhaps steric effects etc.

-0.6951 au -0.7132 au

-0.7169 au -0.7205 au

<

The calculated UV-Vis spectrum is shown below, showing the peak at ~300 NM responsible for the intense yellow colour (300-400 nm).

The strongest oscillator contribution to the transition is shown below.

LUMO au HOMO

So here I have cast a little more light on this relatively unknown natural yellow dye, that was used for many centuries to colour woollen materials.

January 3, 2023
Molecules of the year -2022. A closer look at the Megalo-Cavitands.

In the previous post, I discussed how data associated with two of the candidates for molecules of the year – 2022 could be retrieved and then used to inspect their three dimensional structures. Here I focus on the ultra large cavitands recently reported[cite]10.1002/anie.202209885[/cite]. As I noted, these have an associated data coordinate archive published on Zenodo (DOI: 10.5281/zenodo.6953961) although this is not cited in the article itself.

Shown below are the coordinates of the A4-T molecule containing C₇₀, the first being optimized at the PM6 level and the second at the PM7 level. The most obvious difference is that all the close C-H…H-C contacts of the host molecule shrink from between ~4Å to 2.6Å at PM6 geometries, down to about 2.1Å for PM7, a contraction of at least 0.5Å. Also, the gap between the host and the guest reduces from around 4.2Å to 3.45 Å (a distance typical of π-π stacking by the way), a significant reduction of ~0.75Å. Click on the two images below to view this model.

The difference in the dispersion terms for these two geometries emerges as 36.6 kcal/mol lower for the PM7 optimised geometry compared to the original PM6 geometry, a significant stabilisation. FAIR data is at DOI: 10.14469/hpc/12022 if you want to analyse the cavity sizes further.

Shown below is the NCI (non-covalent-interaction) surface, computed at the PM7 geometry and using the MNDO wavefunction. This illustrates the stabilisations occuring from the non-covalent density (takes a little while to load).

This post has DOI: 10.14469/hpc/12027

December 15, 2022
Molecules of the year -2022. Data issues!
The list of molecules of the year is out now at C&E News (but you have to have an account to view the list, unlike previous years).^♣ These three caught my eye:
1. Electron in a cube: Synthesis and characterization of perfluorocubane as an electron acceptor,[cite]10.1126/science.abq0516[/cite]. I have already written about this system and will not discuss it further, except to note this one topped the poll!
2. Vernier template synthesis of molecular knots[cite]10.1126/science.abm9247[/cite]
3. Megalo-Cavitands: Synthesis of Acridane[4]arenes and Formation of Large, Deep Cavitands for Selective C70 Uptake[cite]10.1002/anie.202209885[/cite]
The last two are examples of large three-dimensional molecules with unusual properties. The second is an example of a trefoil-of-trefoils, called a triskelion knot and I was very keen to get hold of its coordinates so that I could inspect the knotting. I thought I might summarise here the hierarchical procedures one might try for acquiring such data.
- The most modern method of acquiring data associated with an article is to inspect the citation list at the end of the article. The trend encouraged by the FAIR data principles suggests that if such data has an associated DOI (as indeed the article itself does), then this DOI should be cited in the citations just like articles themselves. This concept is also known as treating data as a first class citizen of the scholarly processes. In this case no data was associated with the 81 citations listed at 10.1126/science.abm9247
- The prevalent method since ~1996 has been to next download any ESI. That is linked here. I cannot help but note that the PDF format is not one optimised for data, but its better than nothing. This PDF has 114 pages, and one eventually finds the following on p 103: structures and corresponding energies uploaded to the Github database (https://github.com/kjhstenlid/AshbridgeVernier2022/). Github is known as a software repository, but its use as a data repository is unusual. Thus no DOI is assigned this data (which would explain why its not listed in the article citations). Here one learns from the readme that it contains Molecular knot structures in cif-file format for the Verner and Sheild knots.
- To get this data one has to pretend it is code, and download the ZIP code archive. The CIF file found there however gives a fatal error when trying to load into a CIF viewer such as Mercury: Reading cell from Cif failed, could not retrieve ‘_cell_length+a’. The CIF is clearly not generated from a crystallographic analysis program but a modelling program and is clearly invalid as a CIF.
- One now has to fall back seeing if the CIF file can be rescued using a text editor. This is non-trivial but about 10 minutes of editing finally produces a file that can be viewed.
- Here is the 3D structure (click on the image to view).
Now for the Megalo-Cavitands (or not). Just as above, one ends up in a 49-page PDF file looking for coordinates. There one gets pictures of PM6-computed models starting on p 28, but alas apparently no associated coordinates.^†

So no 3D models to show here then (sorry, clicking on the image above will not produce them^†).

My concluding remark should be that when an interesting molecule is selected for inclusion in eg the molecules of the year – 2022, one of the criteria for its inclusion is that the availability of full and FAIR data describing its properties should be one of the essential criteria for selection.

^‡I note the method used to generate these coordinates (PM6) is perhaps not ideal; it contains no dispersion attraction terms, which are probably important if modelling host-guest complexation. The PM7 method which does is far better for this sort of thing! This highlights the importance of providing data, in this case 3D coordinates. It would be interesting to recompute the dimensions of these molecules using a method that does allow for dispersion attractions to be included. For just such an example, see here.
^† I have contacted the authors of [cite]10.1002/anie.202209885[/cite] and it turns out a reference to a Data repository submission was omitted from the article. The data is at DOI: 10.5281/zenodo.6953961 and I will report separately on my analysis of the effect of replacing PM6 with PM7.
^♣See this open letter about changes at C&EN.

This post has DOI: 10.14469/hpc/12028
December 13, 2022
Gaseous carbon: The energetics of two forms of tetracarbon, C4 and a challenge!

The topic of dicarbon, C₂, has been discussed here for a few years now. It undoubtedly would be a gas! This aspect of the species came to the fore recently[cite]10.1039/D2CP01214F[/cite] when further experiments on a potential chemical precursor of dicarbon, the zwitterion X(+)-C≡C(-), showed that different variants of X(+), such as not only X=PhI(+), but also e.g. X=dibenzothiophenium(+) appeared to generate a gaseous species, which could be trapped as “C₂” in a solvent-free connected flask experiment.

Part of the mystery is that C₂ itself is an extremely high energy species, its dimerisation to C₄ being around 107 kcal/mol exoenergic in free energy. Now, earlier calculations[cite]10.1039/D1CP02056K[/cite] had revealed that the reaction of the precursor PhI(+)-C≡C(-) with itself can occur on a relatively low energy pathway which avoids the very high energy of C₂. The IRC for this reaction is shown below, showing a modest barrier and a very exothermic reaction to the species PhI(+)-CCCC(-) and IPh.

Here I bring your attention to an odd feature on the IRC, in the region of -5. In this region, effectively “free C₄” is formed (at an energy some 60 kcal/mol lower than the reactants and 167 kcal/mol lower than two molecules of free C₂), but this species is immediately trapped by a PhI to form the final products with a further decrease in energy of ~20 kcal/mol. Suppose however, in a molecular dynamics sense, some proportion of this “C₄” could take a different trajectory and free itself at this point, hence escaping being trapped by PhI? This reaction would then generate what again is presumably a gaseous C₄.

Here I explore what might happen next, to answer the question of whether linear C₄ is stable, or will it convert into something else? The scheme below shows some of the possible pathways, leading to the bicyclic form which I have previously discussed extensively in terms of its stabilising aromaticity. Calculations are at the CCSD(T)/Def2-TZVPPD gas phase level, allowing biradicals to form (FAIR Data DOI: 10.14469/hpc/11956).

You can see that C₄ is in a modest thermal well, with a free energy barrier to cyclisation of ~22 kcal/mol. So generated at relatively low energies, it might retain its linear structure, whereas at room temperatures or higher, it will probably end up as the bicyclic aromatic species.

The key calculation might be that dimerisation reaction shown above. Would molecular dynamics show that a proportion of the reaction allows the escape of C₄? Would that be temperature/pressure dependent? I am not about to try these calculations, but offers of doing so gladly accepted! But that does not necessarily solve the mystery of this reaction, alluded to above.[cite]10.1039/D2CP01214F[/cite] Is the trapped gaseous species C₂ itself, C₄ in some form, or indeed something else entirely?

This post has DOI: 10.14469/hpc/11959

November 29, 2022
Derek Lowe asks “What’s a Journal For?” – Knowledge graphs?

What’s a Journal For? This debate has been raging ever since preprint servers were introduced as far back as 1991! Indeed, during my recent submission of a journal article, one of the questions asked was whether the article was already deposited in such a preprint server (in a positive sense, and not one excluding further submission progress). Since my previous comment on this theme was made more than three years ago, I thought I might update it.

I might start with the observation that some think the concept of a journal really comprises three separate components (up to eight have been suggested); the story or narrative being told, the data on which that story is based and the citations or bibliography which set the context of the story. The latter two components have both developed their own publishing models; the data in a repository and accompanied by rich metadata which provides at least some of the context and citations which have their own model. Article metadata also includes its own citations helping to place the data into a wider context or “bigger picture” as expressed by a knowledge graph,[cite]10.1016/j.patter.2020.100180[/cite] which even CAS Scifinder will now reveal based on your specific search!^‡. Such metadata is also now generally a component of the overall metadata associated with journal articles. The data component is being accompanied by extensive work to enhance the accompanying metadata models[cite]10.1515/pac-2021-2009[/cite] and we might expect rapid progress to be made here in the near future.

So again to ask “what’s a journal for” if two of its essential components have their own publishing models? The story will always have an important role to play and peer review of that story^† will always be an important aspect of the journal – indeed perhaps the most important aspect. So should we focus in our attention on this? I noted that in 2017, a brave new experiment claiming “Open access • Publication charge free • Public peer review • Wikipedia-integrated” of which public peer review was an important component, has accumulated relatively few publications since. I also noted an article[cite]10.1073/pnas.1709586114[/cite] in which the reviewers (but not their reviews) are clearly indicated. This concept too has not made much headway. Will things change in the future? Some think that they have too, or the entire concept of scientific publishing will indeed fragment into many different models and no longer fully serve its purpose.

^‡I cannot resist including my own knowledge graph here, which reveals nicely the impacts of some of the work I have been associated with, as represented by the fans on the outside of the central graph.

^†Although a major component of many peer reviews has the focus on the data and (missing) citations.

October 21, 2022

-0.6951 au	-0.7132 au

-0.7169 au	-0.7205 au
	<

Nitroaryls- A less-toxic alternative reagent for ozonolysis: modelling the final step to form carbonyls.

Sometimes you come across a reaction which is so simple in concept that you wonder why it took so long to be accomplished in practice. In this case, replacing toxic ozone O₃ as used to fragment an alkene into two carbonyl compounds (“ozonolysis”) by a relatively non-toxic simple nitro-group based reagent, ArNO₂in which the central atom of ozone is substituted by an N-aryl group. As reported by Derek Lowe, two groups have published[cite]10.1021/jacs.2c05648[/cite], [cite]10.1038/s41586-022-05211-0[/cite] details of such a reaction (Ar = 4-cyano or 3-CF₃,5-NO₂). But there are (at least) two tricks; the first is to use photo-excitation using purple LEDs (390nm light) to activate the nitro group. The second is to establish the best aryl substituents to use for achieving maximum yields of the carbonyl compounds and the best conditions for achieving the cyclo-reversion reaction, shown below as TS1. That step requires heating the cyclo-adduct up to ~80° in (aqueous) acetonitrile for anywhere between 1-48 hours. Here I take a computational look at that last step, the premise being that if such a model is available for this mechanism, it could in principle be used to optimise the conditions for the process.

The proposed mechanism for the workup in aqueous acetonitrile[cite]10.1038/s41586-022-05211-0[/cite] is shown below, involving TS1 (a thermal pericyclic cycloreversion reaction), TS2 and TS3 involving intervention of either two or three water molecules to produce the carbonyl compounds and an aryl hydroxylamine (which might of itself be a valuable product). It was also mooted[cite]10.1038/s41586-022-05211-0[/cite] that an alternative mechanism might involve extrusion of an aryl nitrene instead of a cycloreversion (shown as TS4). The calculations use the following method: (U)ωB97XD/Def2-TZVPP/SCRF=acetonitrile. The FAIR data DOI for them is 10.14469/hpc/11269.

Since the workup occurs at up to ~80° in aqueous acetonitrile,[cite]10.1038/s41586-022-05211-0[/cite] the activation free energy that would allow this must be <~25 kcal/mol.

The first model is a simple closed shell cyclo-reversion, solvated only by the model continuum, giving a barrier (for ethene as substrate) which is a little on the high side for a relatively facile thermal reaction.
At this level, the nitrene extrusion reaction identifies as a second order saddle-point with a very high energy, eliminating it from possibility for the mechanism.
Allowing the wavefunction to have some biradical character (TS1 has <S²> before annihilation 0.5534, after 0.0858; a pure biradical for which singlet and triplet states are equal in energy would have a value of 1.00) lowers the energy by a modest 2.5 kcal/mol in this model, but producing a somewhat more realistic free energy barrier.
Adding 2H₂O to the model allows TS2 and TS3 to be directly compared to TS1. The barrier drops a further 3.0 or 4.3 kcal/mol respectively for 2 or 3 waters, and also clearly indicates that TS1 is the rate-limiting step. The barrier corresponds to a reaction which is reasonably fast at ambient or slightly elevated temperatures.

Model	ΔG^‡ TS1	ΔG^‡ TS2	ΔG^‡ TS3
Reactants	0
Closed shell ionic	30.0	–
“TS4”	73.9	–
+biradical	27.5	–
+biradical + 2H2O	24.5	13.7	9.2
+biradical + 3H2O	23.2	12.6	-1.5
Products + 3H2O	-20.4

The results here could be used for e.g. computational exploration of how variation in the aromatic group might affect the barrier for cycloreversion.^‡ Ideally, a version of this reaction which might operate at much lower temperatures would enhance this alternative to using ozone.

^‡ The ΔG^‡value for p-CN.3H₂O is lower (22.1 kcal/mol vs 23.3 kcal/mol) suggesting it proceeds rather more quickly than the m-CF₃,NO₂ version. This post has DOI: 10.14469/hpc/11319

October 8, 2022

A new type of bispericyclic reaction: Cyclopropanone + butadiene.

The term bispericyclic reaction was famously coined by Caramella et al in 2002[cite]10.1021/ja016622h[/cite] to describe the unusual features of the apparently innocuous dimerisation of cyclopentadiene. It shows features of two paths for different pericyclic reactions, comprising a 2+4 cycloaddition in the early stages, but evolving into a (degenerate) pair of [3,3] sigmatropic reactions in the latter stages. Houk (who also uses the term ambimodal) has in recent years extended the number of examples of such pericyclic sequences to trispericyclic[cite]10.1021/jacs.8b12674[/cite] (see here) and even an ambimodel tetrapericyclic reaction, as reported at the recent WATOC event. Here I show an example of a new type of bispericyclic reaction, comprising a 2+4 cycloaddition combined with a electrocyclic ring opening.

The reaction starts with cyclopropanone. This species is reported to undergo cycloaddition reactions[cite]10.1002%2F0470023449.ch23[/cite] with e.g. furan; below is shown the scheme with butadiene.

Calculations at the ωB97XD/Def2-TZVPP/SCRF=DCM were undertaken (FAIR data DOI 10.14469/hpc/11246). Because it has been suggested that the “intermediate” ylid might have some biradical character, this was modelled using the option of a spin-unrestricted (UHF) method and a starting guess for the density matrix using the Gaussian keyword guess(mix), which uses a linear combination of the HOMO and LUMO to destroy α-β and spatial symmetries. At the located transition state, the spin expectation operator <S**2>= 0.4003 and it reaches a maximum value of <S**2>= 0.7002, so this system does indeed have some biradical character (the value would be 0.000 for a closed shell system, and 1.000 for a pure biradical). The IRC is shown below.

<S**2> is non-zero only between the range -2.5 to +4.5, the rest of the path has the value <S**2> =0.0. The central region of the path (IRC ~0 to +4.5) shows very small gradients, and can be characterised as the region corresponding to the ylid above acting as a “hidden intermediate”.

The central region also coincides with an increase in the dipole moment at the start of the reaction and a decrease at the end indicating that the system also has a fair amount of zwitterionic character, as implied in the ylid structure shown above. But the dip in dipole moment at around IRC = 2 corresponds to biradical character partially replacing the charge-separated ionic mode.

The IRC animation shown below shows that in the early stages of the reaction, electrocyclic ring opening (with disrotation) occurs, but no explicit ylid intermediate is formed. Instead the reaction continues without pause to a cycloaddition reaction for completion.

One interesting question is what impact does the partial biradical character have on this outcome? Shown below are the IRC responses to a potential for which the wavefunction has <S**2> =0.0 across the entire region.

The reaction reveals an entirely different, but still bis-pericyclic reaction involving the same electrocyclic ring opening to the ylid shown above, but then reaction as a 2+2 cycloaddition reaction via the oxygen rather than the carbon of the ylid. The dipole moment no longer shows the biradical-induced dip in the middle, but rises to a full maximum instead.

So here we see a sequential bispericyclic reaction in which the product of an initial electrocyclic pericyclic reaction is so reactive that it immediately reacts with a diene to form the final product, in an overall concerted but asynchronous reaction. Modelling this seems to be very method dependent, and the apparently correct product is only obtained if an open-shell biradical wavefunction is used. Are there more examples?

This post has DOI: 10.14469/hpc/11247

September 30, 2022
Examples of inverted or hemispherical carbon?

In previously asking what the largest angle subtended at four-coordinate carbon might be, I noted that as the angle increases beyond 180°, the carbon becomes inverted, or hemispherical (all four ligands in one hemisphere). So what does a search for this situation reveal in the CSD? The query can be formulated as below, in which the distance from the centroid of the four ligands to the central carbon is specified to be in e.g. the range 0.8 to 1.1Å. For tetrahedral carbon surrounded by four carbon ligands, the value would be close to zero, so any value larger than say 0.8Å is worth inspecting.

Many of the 101 hits are false positives for inverted carbon (by inspection), but five turn out to be propellanes and eight contain the unusual motif shown below:

Here I give one example of each. SADHUA[cite]10.1002/hlca.19880710827[/cite] is a crystalline [1.1.1]propellane in which the “central” bond length is a normal looking 1.558Å. In fact there is positive (experimental) difference electron density on both “exo” ends of this bond and negative difference density in the “endo” region, suggesting the bond is indeed unusual (FAIR DOI: 10.14469/hpc/11159).

One example of the other motif is SEWZID[cite]10.1021/om00118a008[/cite], where the four ligands to the inverted carbon comprise two C-C bonds and two apparent C-Fe bonds of length 2.04Å. A typical C-Fe bond length is in the region 1.8Å, so these are longish C-Fe bonds. Indeed, their Wiberg bond orders emerge as ~0.3, so they would not normally count as a “bond”. Nonetheless, they are indexed as such in the CSD! This highlights an interesting aspect of how to construct a searchable crystal structure database. You have to make a decision on whether any pair of atoms is “bonded” or not. And the decision for bonds with orders <1 can be particularly difficult, especially if calculations of these properties are not part of your assignment toolkit.

So we might conclude that inverted or hemispherical four-coordinate carbon is a rare beast; all the more surprising that the best known examples, the [1.1.1]-propellanes are so stable! Apart from the metallocarbons, one of which is illustrated above, are there any others?

September 15, 2022