Thesaurus Update Mechanism Overview

Thesaurus Update Mechanism Overview

Astronomy and astrophysics are fast-changing fields, where recent discoveries (like dark matter/energy and exoplanets) encourage new research. In fact, one of the primary purposes of the ADS is to provide a discovery portal where all the literature relevant to research in Astronomy and Astrophysics is properly represented and easily accessed.  Scientists and librarians alike use ADS knowing that it is complete and updated on a daily basis.

Similarly, discoveries in astronomy and astrophysics can define new terms that quickly get added to the domain’s vocabulary, which also means that the UAT needs a mechanism to allow these terms to be identified, validated and added to the thesaurus relatively quickly. To do this, the UAT is split between:

  • a public system that provides users with the ability to browse the UAT, as well as to suggest new terms or changes to existing terms; and
  • a private system that allows additions and modifications suggested by users to be evaluated in context and a decision on the users’ suggestions to be made by those qualified to judge their merits.

In particular, the ‘public system’ of the UAT is comprised of a website whose main feature is a thesaurus browser (more specifically, a viewer capable of displaying the SKOS thesaurus and allowing users to perform searches on it). For each term, the thesaurus browser provides information about the term, such as:

  • whether the term is related to a broader concept or is itself a broader concept for other terms
  • if the term has any equivalent terms
  • any ‘scope notes’ that provide details about how the term is used for classification
  • what the term’s provenance (i.e., history) within the UAT has been (e.g., was it added through a user suggestion?)
  • translation of the term in other languages

Although the UAT is expected to be the primary entity within the thesaurus browser, other related controlled vocabulary may also be displayed alongside the UAT. For example, the thesaurus browser may display a taxonomy of astronomical facilities along with their equipment so that a user could search both controlled vocabularies simultaneously, such as for ‘Aceribo radio telescope’. In this example, the UAT may show that this radio telescope is related to the broader category of ‘telescopes, radio’ while the facility taxonomy shows the equipment located at Aceribo.

Furthermore, the UAT’s thesaurus browser provides a feature (i.e., a button or form) whereby a user can make a suggestion to modify information for the term presently displayed or to suggest an entirely new term to be added to the UAT. When used, this feature presents a form that allows the user to provide support for their idea, such as by providing ‘literary warrant’ citations that show the new or modified term’s usage in context. The feature also allows the user to provide contact information in case further follow-up is needed. Once the user has completed the form sufficiently, it’s content is transferred to the private system for evaluation.

In contrast to the public system that is designed to allow browsing and searching by the public at large, the private system is designed as a decision-support system for evaluating suggestions. In particular, the private system uses both the information submitted by the user via the form, as well as information pulled from other sources (such as the ADS, arXiv dataset) to generate a ‘decision form’ that provides context for decision-making. In addition to the information provided by the user, the decision form might include:

  • links to the articles cited by the user as justification for the term addition/change;
  • a weighted impact score for the cited articles to indicate their relative importance within the literature
  • internal statistical information about the user, such as:
    • if the user has submitted prior suggestions
    • the result of those suggestions (e.g., a ratio showing number of accepted submissions to overall submissions)
  • an assessment showing the general ‘trustworthiness’ of the user who submitted the suggestion, such as their affiliation to a particular university or facility and/or their overall citation impact score

Once created, the decision form is sent to an “UAT editor”, who is responsible for evaluating the suggestion and making a determination as to whether it should be implemented in the UAT. The UAT editor is typically an astronomer or astrophysicist who is knowledgeable about a particular area of their scientific domain and therefore can make informed decisions regarding the implementation of the term in the UAT. The decision form may be sent to the UAT editor via one of the following mechanisms:

  • manually: whereby all decision forms are sent to a designated editor first, and the designated editor then assigns them to the appropriate UAT editor; or
  • automatically: whereby each decision form is evaluated based on a) its origin in the UAT and b) the suggestion being made by the user (i.e., add new term at that location, change a term) and a sub-system references a list of UAT editors to find the editor with knowledge of the domain corresponding to that portion of the UAT.

In addition to the information added to convert the suggestion into the decision form, the private system provides the UAT editor with a set of options (which may be in the form of buttons, drop-down boxes or similar) to either implement the suggestion or not implement the suggestion. If the UAT editor chooses to implement the suggestion, the private system converts the decision form to a SKOS update package, which contains RDF/SKOS markup sufficient to update the UAT in accordance with the user’s suggestion. On the other hand, if the UAT editor chooses to decline the suggestion, the editor is provided with a list of reasons from which they can choose; the chosen reason forms the basis of a pre-written email sent by the private system to the user that informs them that their suggestion has not been implemented and the reason why.

In the case where the UAT editor has chosen to implement the user’s suggestion and a RDF/SKOS update package has been generated, the private system also does the following:

  • it updates the UAT displayed published in the thesaurus browser to show the user’s suggestion, incrementing the version identifier to indicate that a new version has just been published (e.g., 2.1.3.6p);
  • it marks the version identifier with an indicator (e.g., the ‘p’ in the identifier above) to show that the change made was ‘provisional’ and has not been made final; and
  • it alerts the UAT editors of the change (e.g., via email) and stores the SKOS update package for further review.

The reason the private system marks changes made by the UAT editor(s) as provisional is because it is likely that fast-moving developments in astronomy and astrophysics may render parts of the UAT less clear or useable than is considered ideal by taxonomy and thesauri standards. This is by design as the UAT editors who choose to implement user suggestions to the thesaurus are concerned with coverage of their knowledge domain, not necessarily with higher-level issues of thesaurus usability. Therefore, one of the UAT editors (who is likely to be a librarian or someone with thesaurus-related training) referred to here as the ‘editor-in-chief’ will review the UAT at regular intervals from a holistic basis to do the following:

  • re-review and render decisions made by the UAT editors final to adopt the user’s suggestion to the UAT;
  • identify terms and branches of the UAT that could or should be reviewed by other UAT editors to improve usability and clarity (e.g., a term appearing in too many areas, or a branch that is too ‘heavy’ with terms to be useful); and
  • implement changes to the UAT’s structure to improve usability and findability.

It is worth noting that as described previously, the private system may not provide the ability for the editor-in-chief to complete the tasks above. However, the private system will provide a way to export the UAT in its provisional state to a format (i.e., SKOS) that would allow it to be imported into another application that would allow the editor-in-chief to complete the above, such as Data Harmony’s MAIstro Thesaurus software suite or a similar application.

Upon completion of the above, the version of the UAT will be considered as ‘finalized’ for that period and the private system does the following:

  • it updates the status of each suggestion accepted by both the editors and the editor-in-chief from ‘provisional’ to ‘final’
  • it generates an ‘audit trail’ for the term(s) modified by the suggestion to show:
    • the identity of the user made the original suggestion
    • the editor (&/or editor-in-chief) who accepted the user’s suggestion
    • the dates of submission and acceptance
  • it updates statistics related to the user, such as to indicate the number of accepted suggestions;
  • it generates SKOS update packages to update the UAT and then implements these updates, publishing the latest version of the UAT with a version identifier indicating that this version is considered final (e.g., 2.4.0.0f, where ‘f’ indicates the final version)
  • it updates the version of the UAT displayed in the browser within the public system to the latest finalized version; and
  • it publishes a SKOS version of the finalized UAT to a website in order that others can download this version.

In this way, both the private and public systems ensure that the latest version of the UAT available is always available for browsing, as well as that the last finalized version of the UAT is available for use through its SKOS file. In addition, the thesaurus browser may provide means (e.g., via a button, switch or likewise) to toggle between the latest provisional version of the UAT and the last finalized version of the UAT. This may allow an interested user to identify the latest provisional changes made to the UAT that reflect new developments, research or discoveries in astronomy and/or astrophysics.

Although the above description of the process whereby a user suggestion makes its way through private system is described with regards to the UAT, a similar process could be implemented with any related vocabulary that appears in the thesaurus browser. The major difference between the operation of the UAT and other controlled vocabularies is the particular process by which suggestions are evaluated and/or implemented. For example, if the facilities supervisor of the Arecibo Radio Telescope noticed that a piece of equipment at their facility was missing, he or she could submit a user suggestion to add it to the taxonomy. The public system would forward the director’s suggestion to whoever is identified in the private system as being responsible for the facilities taxonomy, and that person would then evaluate, update and republish the facilities taxonomy to the public system where it would be available in the thesaurus browser.

Finally, it is worth noting that the statistics kept by the private system relating to user suggestions can be seen both as providing a proxy indicating the user’s ‘trustworthiness’ related to suggestions, but it can also be used to identify prospective new UAT editors. For example, a user who makes numerous accepted suggestions in a particular area of the UAT may have a better knowledge of this particular domain, which could be leveraged if he/she was an editor. Once that user reaches a certain threshold of accepted suggestions (e.g., 10 accepted suggestions), the private system may flag a particular user to the editor-in-chief as a potential new editor. As a result, the UAT can help identify new editors who could replace existing editors in case the latter cannot maintain their editorial duties, such as due to research/teaching duties or retirement.