Tag: Academic publishing

  • Journal innovations – the next step is augmented reality?

    In the previous post, I noted that a chemistry publisher is about to repeat an earlier experiment in serving pre-prints of journal articles. It would be fair to suggest that following the first great period of journal innovation, the boom in rapid publication “camera-ready” articles in the 1960s, the next period of rapid innovation started around 1994 driven by the uptake of the World-Wide-Web. The CLIC project[cite]10.1080/13614579509516846[/cite] aimed to embed additional data-based components into the online presentation of the journal Chem Communications, taking the form of pop-up interactive 3D molecular models and spectra. The Internet Journal of Chemistry was designed from scratch to take advantage of this new medium.[cite]10.1080/00987913.2000.10764578[/cite] Here I take a look at one recent experiment in innovation which incorporates “augmented reality”.[cite]10.1055/s-0035-1562579[/cite]

    The title is interesting: “Combination of Enabling Technologies to Improve and Describe the Stereoselectivity of Wolff–Staudinger Cascade Reaction“. One of these technologies relates to “microwave-assisted flow generation of primary ketenes by thermal decomposition of α-diazoketones at high temperature”, but the journal presentation itself attempts the “faster interpretation of computed data via a new web-based molecular viewer, which takes advantage from Augmented Reality (AR) technology“. To access this component directly, go to the link https://leyscigateway.ch.cam.ac.uk/staudinger/ It is not incorporated into the journal infrastructures as the CLIC project attempted, but is perhaps closer to the model I noted in the previous post of supporting (FAIR)  data associated with the article and hosted separately from the journal.

    What happens next depends rather on the Web browser you are using. With many browsers and tablets, a conventional 3D molecular presentation appears; there is no button present where the red arrow points. You will find out this is because “Augmented Reality is not available in your browser, as the getUserMedia() API is not supported

    AR0

    Some browsers (the latest Opera, FireFox, Chrome) do support this feature, and a new AR button appears. Selecting this now layers the video from the device camera onto the 3D molecular model; the molecule now floats in the scene captured by the camera (which in the case below is the room I am sitting in). After a few seconds you are urged to “point the camera towards the AR marker”. The supporting information contains such AR markers as a navigation aid for the 3D coordinates contained there. An example is:

    AR-markers

    If this marker is now brought into the camera view (by printing it, sic) and holding it in front of the camera image, the marker resolves into further data relevant to the molecule of interest, layered into the existing scene of the room and the molecule. For the marker above, it resolves to a reaction energy profile which reveals where the specific molecule sits energetically in terms of the overall reaction.

    AR

    This layering of “heads up” molecular data into a scene comprising a 3D molecular model and the human viewer of that molecule captured in video is what defines the concept of “augmented reality” (the data being the augmentation, rather than the human). 

    Having now tried it out, I was left wondering whether this truly was a great advance in enabling technology for chemistry journals. The role of the camera seems primarily to capture the AR markers contained in the supporting information; the presence of the reader in the video image apparently inspecting the molecule could be regarded as a distraction. The AR markers (QR codes) are merely visual representations of a URL, which in the form of a DOI (as used in this blog) to locate data is rather more familiar to most readers. The DOI, by the way, carries further information in the form of metadata, and which when sent to e.g. DataCite, enables the data to be found. Does the data need to be layered onto the molecule (and apparently floating in front of the reader) to become usable? Could it instead be placed in a pop-up or separate window of its own (as the 1994 CLIC project achieved)? Do the AR markers enable the data to be FAIR? One can Find the data (albeit only by reading and printing the supporting information) and view it in the AR scene, but is it Accessible (can one access the underlying numerical data?) or Interoperable (place it into another program) or Re-usable?

    As with all enabling technologies, one has to always ask if that technology helps or hinders. Or is the principle of KISS (keep it simple) sometimes better? It is however good to see research groups experimenting with these themes and meanwhile readers can judge for themselves whether “heads up” AR augmentation of the data describing research is indeed the next big thing.

  • Chemistry preprint servers (revisited).

    This week the ACS announced its intention to establish a “ChemRxiv preprint server to promote early research sharing“. This was first tried quite a few years ago, following the example of especially the physicists. As I recollect the experiment lasted about a year, attracted few submissions and even fewer of high quality. Will the concept succeed this time, in particular as promoted by a commercial publisher rather than a community of scientists (as was the original physicists model)?

    The RSC (itself a highly successful commercial publisher) has picked up on this and run its own commentary. You will find quotes from yours truly there, along with Peter Murray-Rust, a long time ardent promoter of community driven open science. One interesting aspect is that the ACS runs around 50 journals, and the decision on whether each will accept preprints for publication will (shortly = next few weeks) be made by the individual editors. I wonder if the eventual list of those supporting the project will bring any surprises (bets on J. Am. Chem. Soc. preprints anyone)?

    But I want to pick up on the declared aspiration “to promote early research sharing“. Here I couple research sharing with data sharing. If you share your research, you should also share the data resulting from that research. We are now entering a new era of data sharing (in part as a result of mandation by various funding bodies) and so one has to ask whether a pre-print server will encourage people to create and share FAIR data (data which is findable, accessible, inter-operable and re-usable) as a model to replace the current one of “supporting information” held in enormous PDF files (mostly unFAIR on at least three counts). This question is indeed posed in the RSC commentary. What I would like to see happen are projects such as that described here, which create what were described as “first class research objects”, and which I think amply fulfil the criteria of being FAIR. So, will ChemRxiv preprint servers help promote such FAIR data sharing as part of early research sharing? We will find out soon.

    The ACS supports OA (Open Access) sharing of articles, provided the authors pay (or arrange payment of) the appropriate APC or article processing charge. These charges are complex, being subject to various discounts (for example if you as an author are an ACS member or not) but are generally not insignificant (> $1000). I wondered whether preprints might be subject to an APC, and so I asked the ACS. The response was “we don’t anticipate any submission or usages fees at this time“. I think that means free at point of submission, and free at point of readership “at this time“.

    Finally, let me now summarise as I understand the current family of “research publications”:

    1. The preprint
    2. The final author version as submitted to a journal
    3. The “version of record” (VoR) as published by the journal
    4. Any FAIR published data associated with the article

    All four of these are attempts at “research sharing”. Each may be located in a different location, and each may have its own DOI. And of course we cannot easily know how much overlap there is between each of them. Thus, how might 1-3 differ in terms of the story or “narrative” of scientific claims? Does 4 agree or support 1-3? Does 4 agree with perhaps data subsets contained in 1-3? If keeping abreast of the current research literature is a challenge, imagine having to cope with/reconcile up to four versions of each “publication”! 

    Lots of food for thought here. We have not heard the last of these themes. 

     

  • Data-free research data management? Not an oxymoron.

    I occasionally post about "RDM" (research data management), an activity that has recently become a formalised essential part of the research processes. I say recently formalised, since researchers have of course kept research notebooks recording their activities and their data since the dawn of science, but not always in an open and transparent manner. The desirability of doing so was revealed by the 2009 "Climategate" events. In the UK, Climategate was apparently the catalyst which persuaded the funding councils (such as the EPSRC, the Royal Society, etc) to formulate policies which required all their funded researchers to adopt the principles of RDM by May 2015 and in their future researches. An early career researcher here, anxious to conform to the funding body instructions, sent me an email a few days ago asking about one aspect of RDM which got me thinking.

    The question related to the divide between data as a separate research object (and which therefore has to be managed), and data as an inseparable part of the article narrative, which is of course ostensibly managed by the journal publication processes. Such data may often be the description of a process rather than simply tables of numbers or graphs. In chemistry it may include chemical names and chemical terms as part of an experimental procedure. For one nice illustration of such embedded data, go look at the chemical tagger page. Here the data is blending with the semantics, and the two are not easily separated. So, when such separation is not easily achieved, should the specific processes required by RDM as illustrated in the five bullet points below actually be followed?

    1. Specify a data management plan to be followed, as for example points 2-5 below.
    2. Decide upon a location for your data, separated into one for "live" or working data (the purpose simply being to ensure it is properly backed up) and the other for a sub-set of formally "published data" which has to be available for at least ten years after its publication.
    3. Use 2 to gather metadata (see 6-14 below) and in return get a DOI representing the location of the combined metadata + data, from a suitable registration authority such as DataCite.
    4. Quote this DOI(s) in the article describing the results of analysing the data and presenting hypotheses, and conversely once the article itself is allocated its own DOI from a registration authority such as CrossRef, update the metadata in item 3 so as to achieve a bidirectional link between the data and its narrative (and we assume that DataCite and CrossRef will also increasingly exchange the metadata they each hold about the items).
    5. Add both the data and the article DOIs to any institutional CRIS or current research information system (parenthetically, I regard this last stipulation as rather redundant if items 3 and 4 are working effectively, but its a good interim measure whilst the overall system matures).

    So, should step 2 be included if the data itself is inextricably intertwined with the narrative and cannot be separated? The slightly surprising advice I would suggest is yes! And the answer is that it IS possible to generate metadata (data about the, possibly entwined, data) which CAN be processed in such a step. What forms would such metadata take?

    1. Identification of the researcher(s) involved. This would nowadays take the form of an ORCID (Open Research and Collaborator Identifier).
    2. Identification of the hosting institution where the data has been produced. There is currently no equivalent to an ORCID for institutions, but it is very likely to come in the future.
    3. A date stamp formalising when the (meta)data is actually deposited.
    4. A title for the project being described. Here we see a blurring between the narrative/article and the data; a title is the shortest possible description of the narrative/article, and it may also apply to the data object(s) or it could have its own title.
    5. A slightly fuller abstract of the project being described. Here we see further blurring between the narrative and data objects.
    6. One can include "related identifiers", in particular the DOIs of any other relevant articles that might have been published which may expand the context of the data, and also the DOIs of any other relevant datasets which may have been allocated in step 2 above.
    7. It is also beneficial to include "chemical identifiers". These can take the form of InChI strings and InChI keys, which allow discretely defined molecular objects which were the object of the research to be tracked and which relate to both the narrative and any other data objects.
    8. If specific software has been used to analyse data, it too can be included as a "related identifier" (e.g. [cite]10.5281/zenodo.19272[/cite]
    9. Potentially at least, if a well-defined instrument has been involved, it too could be included with its own "related identifier". With both 13-14, other issues may need addressing, such as versioning etc, but this no doubt will be sorted in due course.
    10. etc.

    So items such as 6-14 can be collected and sent to e.g. DataCite with a DOI received in return as part of item 2 of the RDM processes. No "pure" data need be involved, only metadata. Nonetheless such metadata can only increase the visibility and discoverability of the research, as illustrated in how such metadata can be searched for the components described above.

  • Collaborative FAIR data sharing.

    I want to describe a recent attempt by a group of collaborators to share the research data associated with their just published article.[cite]10.1021/jacs.5b13070[/cite]

    I am here introducing things in a hierarchical form (i.e. not necessarily the serial order in which actions were taken).

    1. The data repository selected for the data sharing is described by (m3data) doi: 10.17616/R3K64N[cite]10.17616/R3K64N[/cite]
    2. A collaborative project collection was established on this repository (doi: 10.14469/hpc/244[cite]10.14469/hpc/244[/cite]). This data collection has some of the following attributes:
    3. Its metadata is sent here: https://search.datacite.org/ui?&q=10.14469/hpc/244 where it can be queried for other details.
    4. The project collaborators are all identified by their ORCID, used to obtain further individual information about the researchers. This information is also propagated to the metadata sent to DataCite.
    5. In the section labelled associated DOIs there is a link to the recently published peer-reviewed article, which itself cites the data via doi: 10.14469/hpc/244 and which thus establishes a bidirectional link between the article and its data.
    6. Also in the associated DOIs section are other DOIs (to two figures and two tables) held in a separate location. One example: doi: 10.14469/hpc/332[cite]10.14469/hpc/332[/cite]) which illustrates the original type of data sharing we started about 10 years ago. This form has been variously called a "WEO" or Web-enhanced object (by the ACS) or interactivity boxes (RSC, etc). In such WEOs, we wrap the data into an interactive visual appearance using Jmol or JSmol software. The data itself is directly available to the reader using the Jmol export functions (right mouse click in the visual window).

       

      • In this specific example the WEO has been assigned its DOI using the repository noted above.[cite]10.17616/R3K64N[/cite] 
      • We have in the past also used Figshare[cite]10.17616/R3PK5R[/cite]) for this purpose, see e.g. 10.6084/m9.figshare.1181739
      • The WEO itself can itself reference a more complete set of data used to create the visual appearance, for example data that allows the wavefunction of the molecule to be computed,  doi: 10.6084/m9.figshare.2581987.v1[cite]10.6084/m9.figshare.2581987.v1[/cite] In this instance this is held on the Figshare[cite]10.17616/R3PK5R[/cite] repository.
    7. The collection has another section labelled Members. These are individual datasets associated with the collection and held on the SAME repository as the collection itself. In this case, there are five such members, two of which are listed below:

       

      1. 10.14469/hpc/281[cite]10.14469/hpc/281[/cite] contains a variety of other data such as outputs from an IRC (intrinsic reaction coordinate), energy profile diagrams and ZIP archives of other calculations.
      2. 10.14469/hpc/272[cite]10.14469/hpc/272[/cite] itself contains five members, one of which is e.g.

         

        • 10.14469/hpc/267[cite]10.14469/hpc/267[/cite] which contains a ZIP archive with NMR data (see here for how this might be packaged in the future) and a file for a GPC (chromatography) instrument.
        • This last item also contains a new section labelled Metadata, which includes e.g. the InChI key and InChI string for the molecule whose properties are reported.

    If this mode of presenting data seems a little more complex than a single monolithic PDF file, its because its designed for:

    1. collaboration between scientists, potentially at different locations and institutions.
    2. attribution of provenance/credit for the individual items (via ORCID).
    3. separate date stamping by the various contributors.
    4. providing bi-directional links between data and publications.
    5. holding what we call FAIR (findable, accessible, interoperable and reusable) data, rather than just data encapsulated in a PDF file.
    6. Collecting, storing and sending metadata for aggregation in a formal way, i.e. to DataCite using a formal schema to render the metadata properly searchable.

    Thus 10.14469/hpc/244 represents our most complex attempt yet at such collaborative FAIR data sharing with multiple contributors. The tools for packaging many of the datasets are still quite limited (see again here) and the design is still being optimised (call it α). When the repository[cite]10.17616/R3K64N[/cite] has been more extensively tested, we intend to make it available as open source for others to experiment with. And of course, when this happens the source code too will have its own DOI!


    A refactoring of the Figshare site in December 2015 meant that the DOI no longer points directly to the WEO, and you have to follow a manually inserted link on that page to see it.

  • Publishing embargoes.

    Publishing embargoes seem a relatively new phenomenon, probably starting in areas of science when the data produced for a scientific article was considered more valuable than the narrative of that article. However, the concept of the embargo seems to be spreading to cover other aspects of publishing, and I came across one recently which appears to take such embargoes into new and uncharted territory.

    One example (there are many others) of embargoes continuing to operate in the era of open science and open data relates to crystallographically derived coordinates for macromolecules. Biomolecular structures are allowed to be embargoed for a maximum of one year before becoming openly available or "released" (considered a friendlier term than embargo). A more recent phenomenon is of embargoes on press releases which may be prepared by authors and or publishers to accompany the appearance of any article considered especially newsworthy. The publisher will then request that the press release is only released to coincide with the actual publication time and date of the article itself. Both of these types of embargo are more or less accepted by both parties. But in the last five years or so, new types of embargo have been introduced and it is these I want to discuss here.

    1. The self-archive or "green open access" version of an article, in the form of the last author version of an accepted manuscript prior to copy-editing and other operations by a publisher. Such Green OA versions are now a mandatory requirement from funders (in the UK), arising from the need to conduct a "REF" or research excellence framework assessment of all (UK) universities every seven years or so. In order to allow assessors and funding councils unencumbered access to these research outputs, the authors must self-archive their publications in a suitable institutional repository. In general therefore, there should always exist two versions of any scientific paper authored within these guidelines, the AV (author version) and VoR (Version of Record, held by the publisher, and carrying the guarantee of peer review). Publishers now embargo author versions until the VoR version has been published, and sometimes even up to 18 months beyond this period.
    2. The "supporting information" or SI embargo. This is closely related to the crystallographic data embargo noted above, but it applies in general to most other data and information associated with an article. Until very recently, most SI was in fact handled by the publisher themselves, and so it was released at the same time as the article. Since it is becoming more common to deposit data and SI in a separate repository, some publishers mandate that the release dates of this material must not precede the article itself. Deposition of such data has also become a mandatory requirement from (UK) funders since May 2015, and I have blogged about such "research data management" often here. In effect, both the scientific article and the data supporting it achieve their own DOIs or persistent digital identifiers, allowing easy and independent access to either the article OR its data. In fact, assigning such a DOI has a more subtle effect; creating a DOI means that metadata describing the object is also created and then aggregated by the agency issuing the DOI such as CrossRef and DataCite. Importantly, one should note that SI which is handled purely by the publisher will not have its own separate DOI and it will not have its own metadata. The data metadata for example can include the DOI for the article, and vice versa. I have shown examples of the utility of such metadata for data in an earlier post.
    3. So now we come to the most recent embargo, which has surfaced since around May 2015, as increasingly data has become a first class object in its own right with its own DOI and importantly its own metadata. There is now evidence that some publishers are requesting that this very metadata about data is also subjected to an embargo, not to be released before the article which makes use of that data is itself released. So data can be deposited in "dark form" prior to a publication, but the metadata (which carries the date stamp and provenance for the deposition) may have to be "dark" or embargoed. Actually, this is not yet very common; for example I asked the Royal Society of Chemistry what their policy was, with the reply "the Royal Society of Chemistry wouldn’t require metadata about the data files to be embargoed".

    We live in an era where the very careers of reseachers can be determined by their claim to priority about scientific discoveries. The date stamps for priority continue to be largely controlled and issued by publishers and some may decide that it will be in their business interests to extend their control to data. Perhaps they may even wish to control all aspects of publication including the data and its metadata, acting as self-proclaimed research facilitators.

    At this moment, this has not happened; both data and its metadata can remain open and FAIR. Which is where I think we should go in the future in the interests of open science itself.

  • Global initiatives in research data management and discovery: searching metadata.

    The upcoming ACS national meeting in San Diego has a CINF (chemical information division) session entitled "Global initiatives in research data management and discovery". I have highlighted here just one slide from my contribution to this session, which addresses the discovery aspect of the session.

    Data, if you think about it, is rarely discoverable other than by intimate association with a narrative or journal article. Even then, the standard procedure is to identify the article itself as being of interest, and then digging out the "supporting information", which normally takes the form of a single paginated PDF document. If you are truly lucky, you might also get a CIF file (for crystal structures). But such data has little life of its own outside of its parent, the article. Put another way, it has no metadata it can call its own (metadata is data about an object, in this case research data). An alternative is to try to find the data by searching conventional databases such as CAS,  Beilstein/Reaxys or CSD, and there of course the searches can be very precise. But (someone) has to pay the bills for such accessibility.

    We are now starting to see quite different solutions to finding data (the F in FAIR data, the other letters representing accessibility, interoperability and re-usability). These solutions depend on metadata being a part of the solution from the outset, rather than any afterthought produced as a commercial solution. The collection of metadata is part of the overall process called RDM, or research data management, perhaps even the most important part of it. In exchange for identifying metadata about one's data, one gets back a "receipt" in the form of a persistent identifier for the data, more commonly known as a DOI. The agency that issues the DOI also undertakes to look after the donated metadata, and to make it searchable. The table below shows eight searches of such metadata, one example of how to acquire statistics relating to the usage of the data and one search of how to find repositories containing the data.

    Search queries enabled by the use of metadata in data publication
    # Search query* Instances retrieved:
    1 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey\:*  InChI identifier
    2 http://search.datacite.org/ui?q=alternateIdentifier:InChI\:*  InChI key 
    3 http://search.datacite.org/ui?q=alternateIdentifier:InChIKey:CULPUXIDFLIQBT-UHFFFAOYSA-N InChI key CULPUXIDFLIQBT-UHFFFAOYSA-N 
    4 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChIKey\:* ORCID 0000-0002-8635-8390 AND (boolean) InChI key.
    5 http://search.datacite.org/ui?q=ORCID:0000-0002-8635-8390+alternateIdentifier:InChI\:InChI=1S\/C9H11N5O3* ORCID 0000-0002-8635-8390 AND (boolean) + InChI string 1S/C9H11N5O3 with the * wild.
    6 http://search.datacite.org/ui?q=has_media:true&fq=prefix:10.14469 Has content media for Publisher 10.14469 (Imperial College)
    7 http://search.datacite.org/ui?q=format:chemical/x-* Data format type chemical/x-* 
    8 http://search.datacite.org/api?&q=prefix:10.14469& fq=alternateIdentifier:InChIKey\:*& fl=doi,title,alternateIdentifier& wt=json&rows=15
    http://api.labs.datacite.org/works?q=prefix:10.14469+AND+alternateIdentifier:InChIKey\:*
    First 15 hits in JSON format, batch query mode
    9 http://stats.datacite.org/?fq=datacentre_facet:"BL.IMPERIAL – Imperial College London" resolution statistics for publisher 10.14469 (Imperial College) per month
    10 http://service.re3data.org/search?query=&subjects[]=31 Chemistry Research data repository search for Chemistry (135 hits)

    In this instance the three MIME media types are chemical/x-wavefunction, chemical/x-gaussian-checkpoint and chemical/x-gaussian-log. See[cite]10.1021/ci9803233[/cite] for chemical MIME (multipurpose internet media extensions).


    Anyone familiar with the standard ways of finding data (CAS, CSD, Reaxys) will appreciate that the above does not yet have the finesse to find eg sub-structures of chemical structures, synthetic procedures or molecular properties. My including it here is primarily to show some of the potential such systems have, and to remark particularly that the batch query capability of this infrastructure could indeed be used in the future to construct much more sophisticated systems.  Oh, and to the end-user at least, the searches shown above do not require institutional licenses to use. Both the data and its metadata is free, mostly with a CC0 or CC BY 3.0 license for re-use (the R of FAIR).

    If more of interest related to this topic emerges at the ACS session,  I will report back here.

  • LEARN Workshop: Embedding Research Data as part of the research cycle

    I attended the first (of a proposed five) workshops organised by LEARN (an EU-funded project that aims to ...Raise awareness in research data management (RDM) issues & research policy) on Friday. Here I give some quick bullet points relating to things that caught my attention and or interest. The program (and Twitter feed) can be found at https://learnrdm.wordpress.com where other's comments can also be seen. 

    • Henry Oldenburg, founder member and first secretary of the Royal Society, was the first Open Scientist.
    • About 100 people attended the workshop. Of these ~3-5 identified themselves as researchers creating data, and the rest comprised research data managers, administrators, librarians, publishers (but see below) etc. Many were new to their posts.
    • Not publishing scientific data should become recognised as scientific malpractice.
    • Central libraries should pro-actively disperse their knowledge to data scientists in departments.
    • If a scientist is concerned that openly publishing their data might give advantage to their competitors, they are urged to counteract this by "being cleverer than the others". 
    • The three great bastions of open science are (a) Open Data, (b) Open access articles and (c) doing science openly. Examples of this third category include open notebook science (ONS), a form notably pioneered by Jean-Claude Bradley. One attribute of ONS was noted as no insider knowledge.
    • Learned societies should endow medals for Open Science.
    • (Some) publishers are reinventing themselves as Research Facilitators.

    The plenaries are all well worth dipping into (certainly the video and in some cases all the slides are scheduled to appear).

    If you are a researcher (undergraduate students, PGs, PDRAs, early career researchers and academics) you should immediately track down your local evangelist/expert in RDM and ask what the local infrastructures are (or will be shortly built). 

  • Single Figure (nano)publications, reddit AMAs and other new approaches to research reporting

    I recently received two emails each with a subject line new approaches to research reporting. The traditional 350 year-old model of the (scientific) journal is undergoing upheavals at the moment with the introduction of APCs (article processing charges), a refereeing crisis and much more. Some argue that brand new thinking is now required. Here are two such innovations (and I leave you to judge whether that last word should have an appended ?).

    To set the scene for the first, I will quote the abstract: “The single figure publication is a novel, efficient format by which to communicate scholarly advances. It will serve as a forerunner of the nano-publication, a modular unit of information critical for machine-driven data aggregation and knowledge integration[cite]10.12688/f1000research.6742.1[/cite] The kernel of this suggestion is (again I quote) “We offer the idea of the micro-publication unit, the single figure publication (SFP), to provide scholars with a real-world, manageable method to inform research.” I was struck by the overlap between this suggestion and the one you may find on many of the posts on this blog, where what I refer to as FAIR Data is assigned a digital object identifier (DOI) and included in the citation lists at the end of the post. The key phrase in the above abstract is machine-driven data aggregation and knowledge, although the article does not really go into any mechanisms for easily achieving this. It is my argument that the act of assigning a DOI carries with it the association that there is machine searchable metadata which can be retrieved and used for the aggregation and knowledge mining. The authors of this article, Do and Mobley, advocate adoption of nanopublications defined by inclusion of just a single figure (notably, not a table of results!) and some accompanying context which they claim would reduce the unit of publication to a more tractable size. This does raise the question of whether science needs more publications (in chemistry alone there are said to be more than a million published each year) or whether we should instead be concentrating our efforts on improving the data side of things by increasing its semantic content and formalising its structures, its preservation and curation. I certainly argue that far too little effort has been poured into these latter activities. You only have to look at the typical SI (supporting information) associated with many chemistry articles to realise that in many cases they are still hardly fit for purpose. There is one concept introduced by Do and Mobley that also deserves mention. Their nanopublications are structured to be read by machines, not people. They will therefore not be refereed by people (my inference). They do not really discuss how else the quality will be assessed, but of course if you treat their nanopublication as essentially FAIR data, then it does become possible to develop methods of machine refereeing.

    The second email alerted me to an article[cite]10.15200/winn.143871.12809[/cite] in the Winnower, a forum that offers a bridge between “traditional scholarly publishing tools to traditional and non-traditional scholarly outputs—because scholarly communication doesn’t just happen in scholarly journals“. Here, the concept of scholarly communication is extended to the New Reddit Journal of Science and introduces the concept pioneered by reddit of the AMA, or “ask me anything” environment. I occasionally publish some of the posts on this blog to the Winnower, receiving in return the increasingly ubiquitous DOI. I have also occasionally quoted these DOIs in articles submitted to conventional chemistry journals. What we see now is the propagation of a Winnower DOI on to e.g. https://www.reddit.com/r/science/ where anyone can post a question related to the original research reporting. I must state that I do have some reservations about this. Whilst it is likely that the majority of traditional scholarly reporting is likely to receive no AMAs (just as a very high proportion of research articles attract few if any citations in other articles over a period of decades), it is also likely that the quality of posted AMAs may turn out to be very low. At which point the original researcher has to make a judgement as to whether to devote any of their increasingly precious and fragmented time to answering them. And if few if any answers are posted in response to an AMA, the system seems unlikely to flourish.

    But what we see here are two serious attempts to develop new approaches to research reporting, and not doubt others will emerge. To quote Yogi Berra, the future is not what it used to be.


    Anyone can also post to this blog to ask similar questions. But note that associating an ORCID with such comments is highly recommended. I do not think that reddit currently supports ORCID, but  I would argue if the intent is serious, it certainly should.