Tag: Technical communication

  • Re-inventing the anatomy of a research article.

    The traditional structure of the research article has been honed and perfected for over 350 years by its custodians, the publishers of scientific journals. Nowadays, for some journals at least, it might be viewed as much as a profit centre as the perfected mechanism for scientific communication. Here I take a look at the components of such articles to try to envisage its future, with the focus on molecules and chemistry.

    The formula which is mostly adopted by authors when they sit down to describe their chemical discoveries is more or less as follows:

    1. An introduction, setting the scene for the unfolding narrative
    2. Results. This is where much of the data from which the narrative is derived is introduced. Such data can be presented in the form of:
      • Tables
      • Figures and schemes
      • Numerical and logical data embedded in narrative text
    3. Discussion, where the models constructed from the data are illustrated and new inferences presented. Very often categories 2 and 3 are conflated into one single narrative.
    4. Conclusions, where everything is brought together to describe the essential aspects of the new science.
    5. Bibliography, where previous articles pertinent to the narrative are listed.

    In the last decade or so, the management of research data has developed as a field of its own, with three phases:

    1. Setting out a data management plan at the start of the project, often a set of aspirations together with putative actions,
    2. the day-to-day management of the data as it emerges in the form of an electronic laboratory notebook (ELN),
    3. the publication of selected data from the ELN into a repository, together with the registration of metadata describing the properties of the data.

    In the latter category, item 8 can be said to be a game-changer, a true disruptive influence on the entire process. The key aspect is that it constitutes independent publication of data to sit alongside the object constructed from 1-5. More disruption emerges from the open citations project, whereby category 5 above can be released by publishers to adopt its own separate existence. So now we see that of the five essential anatomic components of a research article, two are already starting to achieve their own independence. Clearly the re-invention of the anatomy of the research article is well under way already.

    Next I take a look at what sorts of object might be found in category 8, drawing very much on our own experience of implementing 7 and 8 over the last twelve years or so. I start by observing that in 2 above, figures are perhaps the object most in need of disruptive re-invention. In the 1980s, authors were much taken by the introduction of colour as a means of conveying information within a figure more clearly; although the significant costs then had to be borne directly by these authors (and with a few journals this persists to this day). By the early 1990s, the introduction of the Web[cite]10.1039/C39940001907[/cite] offered new opportunities not only of colour but of an extra dimension (or at least the illusion of one) by means of introducing interactivity for three-dimensional models. Some examples resulting from combining figures from category 2 with 8 above are listed in the table below.

    Examples of re-invented data objects from category 2
    Example Object title Object DOI Article DOI
    1 Figure 9. Catalytic cycle involving one amine …etc. 10.14469/hpc/1854 10.1039/C7SC03595K
    2 FAIR Data Figure. Mechanistic insights into boron-catalysed direct amidation reactions 10.14469/hpc/4919 10.1039/C7SC03595K
    3 FAIR Data table. Computed relative reaction free energies (kcal/mol-1) of Obtusallene derived oxonium and chloronium cations 10.14469/hpc/1248 10.1021/acs.joc.6b02008
    4 (raw) NMR data for Epimeric Face-Selective Oxidations … 10.14469/hpc/1267 10.1021/acs.joc.6b02008
    5 Bibliography 10.14469/hpc/1116 10.1021/acs.joc.6b02008

    Example 1 illustrates how a figure from category 2 above can be augmented with active hyperlinks specifying the DOI of the data in category 8 from which the figure is derived, thus creating a direct and contextual connection between the research article and the research data it is based upon. These links are embedded only in the Acrobat (PDF) version of the article as part of the production process undertaken by the journal publisher. Download Figure 9 from the link here and try it for yourself or try the entire article from the journal, where more figures are so enhanced.

    Example 2 takes this one stage further. The hyperlinks in the published figure in example 1 were embedded in software capable of resolving them, namely a PDF viewer. But that is all that this software allows. By relocating the hyperlink into a Web browser instead, one can add further functionality in the form of Javascripts perhaps better described as workflows (supported by browsers but not supported by Acrobat). There are three such workflows in example 2.

    • The first uses an image map to associate a region of the figure data object defined by a DOI.
    • The second interrogates the metadata specifically associated with the DOI (the same DOIs that are seen in the figure itself) to see if there is any so-called ORE metadata available (ORE= Object Re-use and Exchange). If there is, it uses this information to retrieve the data itself and pass it through to
    • the third workflow represented by a set of JavaScripts known as JSmol. These interpret the data received and construct an interactive visual 3D molecular model representing the retrieved data.

    All this additional workflowed activity is implemented in a data repository. It is not impossible that it could also be implemented at the journal publisher end of things, but it is an action that would have to be supported by multiple publishers. Arguably this sort of enhancement is far better suited and more easily implemented by a specialised data publisher, i.e. a data repository.

    Example 3 does the same thing for a table.

    Example 4 enhances in a different manner. Conventionally NMR data is added to the supporting information file associated with a journal article, but such data is already heavily processed and interpreted. The raw instrumental data is never submitted to the journal and is pretty much always possibly only available by direct request from the original researchers (at least if the request is made whilst the original researchers are still contactable!). The data repository provides a new mechanism for making such raw instrumental (and indeed computational) data an integral part of the scientific process.

    Example 5 shows how a bibliography can be linked to a secondary bibliography (citations 35 and 36 in this example in the narrative article) and perhaps in the future to Open Citations semantic searches for further cross references.

    So by deconstructing the components of the standard scientific article, re-assembling some of them in a better-suited environment and then linking the two sets of components to each other, one can start to re-invent the genre and hopefully add more tools for researchers to use to benefit their basic research processes. The scope for innovation seems considerable. The issue of course is (a) whether publishers see this as a viable business model or whether they instead wish to protect their current model of the research article and whether (b) authors wish to undertake the learning curve and additional effort to go in this direction. As I have noted before, the current model is deficient in various ways; I do not think it can continue without significant reinvention for much longer. And I have to ask that if reinvention does emerge, will science be the prime beneficiary?

  • Harnessing FAIR data: A suggested useful persistent identifier (PID) for quantum chemical calculations.

    Harnessing FAIR data is an event being held in London on September 3rd; no doubt all the speakers will espouse its virtues and speculate about how to realize its potential.♥ Admirable aspirations indeed. Capturing hearts and minds also needs lots of real life applications! Whilst assembling a forthcoming post on this blog, I realized I might have one nice application which also pushes the envelope a bit further, in a manner that I describe below.

    The post I refer to above is about using quantum chemical calculations to chart possible mechanistic pathways for the reaction between a carboxylic acid and an amine to form an amide. The FAIR data for the entire project is collected at DOI: 10.14469/hpc/4598. Part of what makes it FAIR is the metadata not only collected about this data but also formally registered with the DataCite agency. Registration in turn enables Finding; it is this aspect I want to demonstrate here.

    The metadata for the above DOI includes information such as;

    1. The ORCID persistent identifier (PID) for the creator of the data (in this instance myself)
    2. Date stamps for the original creation date and subsequent modifications.
    3. A rights declaration, in this case the CC0 license which describes how the data can be re-used.
    4. Related identifiers, in this case describing members of this collection.

    The data itself is held in the members of the collection, each of which is described by a more specific set of metadata in addition to the more general types in the above list (e.g. 10.14469/hpc/4606).

    1. One important additional metadata descriptor is the ORE locator (Object Re-use and Exchange, itself almost a synonym for FAIR). This allows a machine to deduce a direct path to the data file itself, and hence to retrieve it automatically if desired. It is important to note that the DOI itself (i.e. 10.14469/hpc/4606) points only to the “landing page” for the dataset, and does not necessarily describe the direct path to any specific file in the dataset. The ORE path can be used with e.g. software such as JSmol to directly load a molecule based only on its DOI. You can see an example of this here.
    2. Each molecule-based dataset contains additional specific metadata relating to the molecule itself. For example this is how the InChiKey, an identifier specific to that molecule, is expressed in metadata;
      <subject subjectScheme="inchikey" schemeURI="http://www.inchi-trust.org/">PVXKWVPAMVWJSQ-UHFFFAOYSA-N</subject>
      The advantage of expressing the metadata in this way is that a general search of the type:
      https://commons.datacite.org/doi.org?query=subjexts.subjectScheme:inchikey+AND+subjects.subject:CZABGBRSHXZJCF-UHFFFAOYSA-N
      can be used to track down any molecule with metadata corresponding to the above InChIkey.
    3. Here is more metadata, introduced in this blog. It relates to the (computed) value of the Gibbs energy (the energy unit is in Hartree†), as returned by the Gaussian program;
      <subject subjectScheme="Gibbs_Energy" schemeURI="https://goldbook.iupac.org/html/G/G02629.html" valueURI="http://gaussian.com/thermo/">-649.732417</subject>
      I here argue that it represents a unique identifier for a molecule calculation using the quantum mechanical procedures implemented in e.g. Gaussian. This identifier is different from the InChIkey, in that it can be truncated to provide different levels of information.

      • At the coarsest level, a search of the type
        https://commons.datacite.org/doi.org?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\-649.*
        should reveal all molecules with the same number of atoms and electrons whose Gibbs energy has been calculated, but not necessarily with the same InChI (i.e. they may be isomers, or transition states, etc). This level might be useful for revealing most (not necessarily all‡) molecules involved in say a reaction mechanism. It should also be insensitive to the program system used, since most quantum codes will return a value for the Gibbs energy if the same procedures have been used (i.e. DFT method, basis set, solvation model and dispersion correction) accurate to probably 0.01 Hartree.
      • The top level of precision however is high enough to almost certainly relate to a specific molecule and probably using a specific program;
        https://commons.datacite.org/doi.org?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\-649.732417
      • The searcher can experiment with different levels of precision to narrow or broaden the search.
      • I would also address the issue (before someone asks) of why I have used the Gibbs energy rather than the Total energy. Put simply, the Gibbs energy is far more useful in a chemical context. It can be used to relate the relative Gibbs energies of different isomers of the same molecule to e.g. the equilibrium constant that might be measured. Or the difference in Gibbs energies between a reactant and a transition state can be used to derive the free energy activation barrier for a reaction. The total energy is not so useful in such contexts, although of course it too could be added as a subject in the metadata above if a real use for it is found.
    4. The searcher can also use Boolean combinations of metadata, such as specifying both the InChIKey and the Gibbs Energy, along with say the ORCID of the person who may have published the data;
      https://commons.datacite.org/doi.org?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\-649.*+AND+ subjects.subjectScheme:inchikey+AND+subjects.subject:CZABGBRSHXZJCF-UHFFFAOYSA-N+AND+contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390♥

    I have tried to show above how FAIR data implies some form of rich (registered) metadata. And how the metadata can be used to Find (the F in FAIR) data with very specific properties, thus Harnessing FAIR data.


    †It is a current limitation of the V4.1 DataCite schema that there appears no way to specify the data type of the subject, including any units.

    ‡In theory, a range query of the type:
    https://commons.datacite.org/doi.org?query=subjects.subjectScheme:Gibbs_energy+AND+subjects.subject:[\-649.1 TO \-649.8] should be more specific, but I have not yet gotten it to work, probably because of the lack of data-typing means it is not recognised as a range of numeric values.

    ♥Implicit in this search is the grouping
    https://commons.datacite.org/doi.org?query=(subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:\-649.*) + (subjects.subjectScheme:inchikey+AND+subjects.subject:CZABGBRSHXZJCF-UHFFFAOYSA-N)+AND+contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390
    Currently however DataCite do not correctly honour this form of grouping.

    ♥Video of the speakers and the panel session at the end is now available.

  • PIDapalooza 2018. A conference like no other!

    Another occasional conference report (day 1). So why is one about “persistent identifiers” important, and particularly to the chemistry domain?

    The PID most familiar to most chemists is the DOI (digital object identifier). In fact there are many; some 60 types have been collected by ORCID (themselves purveyors of researcher identifiers). They sometimes even have different names; in life sciences they tend to be known instead as accession numbers. One theme common to many (probably not all) is that they represent sources of metadata about the object being identified. Further information if which allows you (or a machine) to decide if acquiring the full object is worthwhile. So in no particular order, here are some of the things I learnt today.

    1. Mark Hahnel noted the recent launch of the Dimensions resource which links research data with other research activities; I have not yet had a chance to learn its capabilities, but it seems an interesting alternative to other stalwarts such as eg Google Scholar etc.

      You can try this example: https://app.dimensions.ai/discover/publication?search_text=10.6084&search_type=kws&full_search=true which retrieves articles in which the data repository with prefix 10.6084 (Figshare) is cited. Try also the prefix 10.14469 which is the Imperial College repository.

    2. Andy Mabbett talked about the deployment and use of persistent identifiers (the Q numbers) in Wikidata, which increasingly underpin the basis for the various flavours of Wikipedia. He also noted their use of some 50 different identifiers.
    3. Johanna McEntyre noted some 5M published articles in life sciences which reference 1M+ ORCID identifiers, easily the domain with the fastest uptake of this type. Also noted was the new FREYA project; aiming to connect open identifiers for discovery, access and use of research resources.
    4. Tom Gillespie talked about RRID, or Research Resource Identifiers. Included in this are hardware, including instruments and with around 6000 RRIDs systematized so far. They argue this area promotes both the A and I of FAIR (accessible and inter-operable). Of course A and I mean many things to many people.
    5. Several other presentations talked about the finer detail of metadata, such as sub-classifications into e.g. descriptive/admin/technical, but I did rather miss demos showing how search queries of such fine-grained metadata could be constructed.

    Apart from the presentations themselves, PIDapalooza is unusual for some other activities. Thus you could go get your PIDnails done, with a selection of 8 or so tasteful logos to choose from. There will be tattoos tomorrow (this is a conference for younger people after all). I may grab a photo or two to provide evidence!

     

  • Data-free research data management? Not an oxymoron.

    I occasionally post about "RDM" (research data management), an activity that has recently become a formalised essential part of the research processes. I say recently formalised, since researchers have of course kept research notebooks recording their activities and their data since the dawn of science, but not always in an open and transparent manner. The desirability of doing so was revealed by the 2009 "Climategate" events. In the UK, Climategate was apparently the catalyst which persuaded the funding councils (such as the EPSRC, the Royal Society, etc) to formulate policies which required all their funded researchers to adopt the principles of RDM by May 2015 and in their future researches. An early career researcher here, anxious to conform to the funding body instructions, sent me an email a few days ago asking about one aspect of RDM which got me thinking.

    The question related to the divide between data as a separate research object (and which therefore has to be managed), and data as an inseparable part of the article narrative, which is of course ostensibly managed by the journal publication processes. Such data may often be the description of a process rather than simply tables of numbers or graphs. In chemistry it may include chemical names and chemical terms as part of an experimental procedure. For one nice illustration of such embedded data, go look at the chemical tagger page. Here the data is blending with the semantics, and the two are not easily separated. So, when such separation is not easily achieved, should the specific processes required by RDM as illustrated in the five bullet points below actually be followed?

    1. Specify a data management plan to be followed, as for example points 2-5 below.
    2. Decide upon a location for your data, separated into one for "live" or working data (the purpose simply being to ensure it is properly backed up) and the other for a sub-set of formally "published data" which has to be available for at least ten years after its publication.
    3. Use 2 to gather metadata (see 6-14 below) and in return get a DOI representing the location of the combined metadata + data, from a suitable registration authority such as DataCite.
    4. Quote this DOI(s) in the article describing the results of analysing the data and presenting hypotheses, and conversely once the article itself is allocated its own DOI from a registration authority such as CrossRef, update the metadata in item 3 so as to achieve a bidirectional link between the data and its narrative (and we assume that DataCite and CrossRef will also increasingly exchange the metadata they each hold about the items).
    5. Add both the data and the article DOIs to any institutional CRIS or current research information system (parenthetically, I regard this last stipulation as rather redundant if items 3 and 4 are working effectively, but its a good interim measure whilst the overall system matures).

    So, should step 2 be included if the data itself is inextricably intertwined with the narrative and cannot be separated? The slightly surprising advice I would suggest is yes! And the answer is that it IS possible to generate metadata (data about the, possibly entwined, data) which CAN be processed in such a step. What forms would such metadata take?

    1. Identification of the researcher(s) involved. This would nowadays take the form of an ORCID (Open Research and Collaborator Identifier).
    2. Identification of the hosting institution where the data has been produced. There is currently no equivalent to an ORCID for institutions, but it is very likely to come in the future.
    3. A date stamp formalising when the (meta)data is actually deposited.
    4. A title for the project being described. Here we see a blurring between the narrative/article and the data; a title is the shortest possible description of the narrative/article, and it may also apply to the data object(s) or it could have its own title.
    5. A slightly fuller abstract of the project being described. Here we see further blurring between the narrative and data objects.
    6. One can include "related identifiers", in particular the DOIs of any other relevant articles that might have been published which may expand the context of the data, and also the DOIs of any other relevant datasets which may have been allocated in step 2 above.
    7. It is also beneficial to include "chemical identifiers". These can take the form of InChI strings and InChI keys, which allow discretely defined molecular objects which were the object of the research to be tracked and which relate to both the narrative and any other data objects.
    8. If specific software has been used to analyse data, it too can be included as a "related identifier" (e.g. [cite]10.5281/zenodo.19272[/cite]
    9. Potentially at least, if a well-defined instrument has been involved, it too could be included with its own "related identifier". With both 13-14, other issues may need addressing, such as versioning etc, but this no doubt will be sorted in due course.
    10. etc.

    So items such as 6-14 can be collected and sent to e.g. DataCite with a DOI received in return as part of item 2 of the RDM processes. No "pure" data need be involved, only metadata. Nonetheless such metadata can only increase the visibility and discoverability of the research, as illustrated in how such metadata can be searched for the components described above.

  • Collaborative FAIR data sharing.

    I want to describe a recent attempt by a group of collaborators to share the research data associated with their just published article.[cite]10.1021/jacs.5b13070[/cite]

    I am here introducing things in a hierarchical form (i.e. not necessarily the serial order in which actions were taken).

    1. The data repository selected for the data sharing is described by (m3data) doi: 10.17616/R3K64N[cite]10.17616/R3K64N[/cite]
    2. A collaborative project collection was established on this repository (doi: 10.14469/hpc/244[cite]10.14469/hpc/244[/cite]). This data collection has some of the following attributes:
    3. Its metadata is sent here: https://search.datacite.org/ui?&q=10.14469/hpc/244 where it can be queried for other details.
    4. The project collaborators are all identified by their ORCID, used to obtain further individual information about the researchers. This information is also propagated to the metadata sent to DataCite.
    5. In the section labelled associated DOIs there is a link to the recently published peer-reviewed article, which itself cites the data via doi: 10.14469/hpc/244 and which thus establishes a bidirectional link between the article and its data.
    6. Also in the associated DOIs section are other DOIs (to two figures and two tables) held in a separate location. One example: doi: 10.14469/hpc/332[cite]10.14469/hpc/332[/cite]) which illustrates the original type of data sharing we started about 10 years ago. This form has been variously called a "WEO" or Web-enhanced object (by the ACS) or interactivity boxes (RSC, etc). In such WEOs, we wrap the data into an interactive visual appearance using Jmol or JSmol software. The data itself is directly available to the reader using the Jmol export functions (right mouse click in the visual window).

       

      • In this specific example the WEO has been assigned its DOI using the repository noted above.[cite]10.17616/R3K64N[/cite] 
      • We have in the past also used Figshare[cite]10.17616/R3PK5R[/cite]) for this purpose, see e.g. 10.6084/m9.figshare.1181739
      • The WEO itself can itself reference a more complete set of data used to create the visual appearance, for example data that allows the wavefunction of the molecule to be computed,  doi: 10.6084/m9.figshare.2581987.v1[cite]10.6084/m9.figshare.2581987.v1[/cite] In this instance this is held on the Figshare[cite]10.17616/R3PK5R[/cite] repository.
    7. The collection has another section labelled Members. These are individual datasets associated with the collection and held on the SAME repository as the collection itself. In this case, there are five such members, two of which are listed below:

       

      1. 10.14469/hpc/281[cite]10.14469/hpc/281[/cite] contains a variety of other data such as outputs from an IRC (intrinsic reaction coordinate), energy profile diagrams and ZIP archives of other calculations.
      2. 10.14469/hpc/272[cite]10.14469/hpc/272[/cite] itself contains five members, one of which is e.g.

         

        • 10.14469/hpc/267[cite]10.14469/hpc/267[/cite] which contains a ZIP archive with NMR data (see here for how this might be packaged in the future) and a file for a GPC (chromatography) instrument.
        • This last item also contains a new section labelled Metadata, which includes e.g. the InChI key and InChI string for the molecule whose properties are reported.

    If this mode of presenting data seems a little more complex than a single monolithic PDF file, its because its designed for:

    1. collaboration between scientists, potentially at different locations and institutions.
    2. attribution of provenance/credit for the individual items (via ORCID).
    3. separate date stamping by the various contributors.
    4. providing bi-directional links between data and publications.
    5. holding what we call FAIR (findable, accessible, interoperable and reusable) data, rather than just data encapsulated in a PDF file.
    6. Collecting, storing and sending metadata for aggregation in a formal way, i.e. to DataCite using a formal schema to render the metadata properly searchable.

    Thus 10.14469/hpc/244 represents our most complex attempt yet at such collaborative FAIR data sharing with multiple contributors. The tools for packaging many of the datasets are still quite limited (see again here) and the design is still being optimised (call it α). When the repository[cite]10.17616/R3K64N[/cite] has been more extensively tested, we intend to make it available as open source for others to experiment with. And of course, when this happens the source code too will have its own DOI!


    A refactoring of the Figshare site in December 2015 meant that the DOI no longer points directly to the WEO, and you have to follow a manually inserted link on that page to see it.

  • Global initiatives in research data management and discovery: searching metadata.

    The upcoming ACS national meeting in San Diego has a CINF (chemical information division) session entitled "Global initiatives in research data management and discovery". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session.

    Data, if you think about it, is rarely discoverable other than by intimate association with a narrative or journal article. Even then, the standard procedure is to identify the article itself as being of interest, and then digging out the "supporting information", which normally takes the form of a single paginated PDF document. If you are truly lucky, you might also get a CIF file (for crystal structures). But such data has little life of its own outside of its parent, the article. Put another way, it has no metadata it can call its own (metadata is data about an object, in this case research data). An alternative is to try to find the data by searching conventional databases such as CAS,  Beilstein/Reaxys or CSD, and there of course the searches can be very precise. But (someone) has to pay the bills for such accessibility.

    We are now starting to see quite different solutions to finding data (the F in FAIR data, the other letters representing accessibility, interoperability and re-usability). These solutions depend on metadata being a part of the solution from the outset, rather than any afterthought produced as a commercial solution. The collection of metadata is part of the overall process called RDM, or research data management, perhaps even the most important part of it. In exchange for identifying metadata about one's data, one gets back a "receipt" in the form of a persistent identifier for the data, more commonly known as a DOI. The agency that issues the DOI also undertakes to look after the donated metadata, and to make it searchable. The table below shows eight searches of such metadata, one example of how to acquire statistics relating to the usage of the data and one search of how to find repositories containing the data.

    Search queries enabled by the use of metadata in data publication
    # Search query* Instances retrieved:
    1 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey\:*  InChI identifier
    2 http://search.datacite.org/ui?q=alternateIdentifier:InChI\:*  InChI key 
    3 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey:CULPUXIDFLIQBT-UHFFFAOYSA-N InChI key CULPUXIDFLIQBT-UHFFFAOYSA-N 
    4 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChIKey\:* ORCID 0000-0002-8635-8390 AND (boolean) InChI key.
    5 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChI\:InChI=1S\/C9H11N5O3* ORCID 0000-0002-8635-8390 AND (boolean) + InChI string 1S/C9H11N5O3 with the * wild.
    6 http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469 Has content media for Publisher 10.14469 (Imperial College)
    7 http://search.datacite.org/ui?q=format:chemical/x-* Data format type chemical/x-* 
    8 http://search.datacite.org/api?&q=prefix:10.14469& fq=alternateIdentifier:InChIKey\:*& fl=doi,title,alternateIdentifier& wt=json&rows=15
    http://api.labs.datacite.org/works?q=prefix:10.14469+AND+alternateIdentifier:InChIKey\:*
    First 15 hits in JSON format, batch query mode
    9 http://stats.datacite.org/?fq=datacentre_facet:"BL.IMPERIAL – Imperial College London" resolution statistics for publisher 10.14469 (Imperial College) per month
    10 http://service.re3data.org/search?query=&subjects[]=31 Chemistry Research data repository search for Chemistry (135 hits)

    In this instance the three MIME media types are chemical/x-wavefunction, chemical/x-gaussian-checkpoint and chemical/x-gaussian-log. See[cite]10.1021/ci9803233[/cite] for chemical MIME (multipurpose internet media extensions).


    Anyone familiar with the standard ways of finding data (CAS, CSD, Reaxys) will appreciate that the above does not yet have the finesse to find eg sub-structures of chemical structures, synthetic procedures or molecular properties. My including it here is primarily to show some of the potential such systems have, and to remark particularly that the batch query capability of this infrastructure could indeed be used in the future to construct much more sophisticated systems.  Oh, and to the end-user at least, the searches shown above do not require institutional licenses to use. Both the data and its metadata is free, mostly with a CC0 or CC BY 3.0 license for re-use (the R of FAIR).

    If more of interest related to this topic emerges at the ACS session,  I will report back here.