AnIML Workshop "Analytical Information Markup Language (AnIML)"
at PittCon 2006, March 16, 2006 in Orlando, FL
Abstract:Many critical decisions in manufacturing and engineering depend on reliable chemical knowledge about materials and chemical reactions. Often this information comes directly through data from instruments in chemical analysis laboratories. However, the interchange and storage of analytical chemistry data has long been hampered by multiple, incompatible data formats. In 1990, manufacturers of analytical instrumentation, through their trade organization, the Analytical Instrument Association (AIA), sponsored the development of standards to interchange mass spectral and chromatography data across vendor platforms. Known as the ANalytical Data Interchange (ANDI) standards, these have been implemented in numerous commercial software products and are currently maintained by ASTM Subcommittee E13.15 on Analytical Data. The JCAMP-DX (Joint Committee on Atomic and Molecular Physics Data Exchange) standards supported by IUPAC serve similar purposes in other areas of spectroscopy. Both of these data interchange schemes utilize pre-web-based technologies that rely, for the most part, upon stand-alone applications to carry out the interchange. While they can interchange data from instrument to instrument, data interchange from instrument to application (e.g., importing data from an instrument into an Excel spreadsheet), from application to application, or from application to and from databases is not as well supported. Modern laboratory management concepts such as electronic laboratory notebooks require simple, common mechanisms for interchanging data between instruments, applications, and databases.
Following the pioneering work at NIST with the Spectroscopy Markup Language (SpectroML) and at Galactic Industries Corporation (now part of Thermo Electron Corporation) with GAML (the General Analytical Markup Language), the AnIML project was begun in ASTM Committee E13 on Molecular Spectroscopy and Chromatography to provide a standard, web-aware mechanism for interchanging and storing spectroscopy and chromatography data based on XML (Extensible Markup Language). Markup languages are used in information technology to describe complex aspects of entities. For example, the Hypertext Markup Language (HTML--the language that enables the Internet) describes how entities on a computer screen are to be laid out and displayed by incorporating descriptive tags with each data entity. XML is similar, but more general, in that it is used to describe data--delineating not only what the data are, but how they can be used, displayed, converted, etc. HTML describes how data look, while XML describes what data are. Unlike HTML, there are no predefined tags in XML. The tags are made up for each application and are formally defined in the XML application's schema or document type definition. Many folks have used XML likely without ever knowing it, because XML is heavily used on the web to enable e-commerce among other things.
Over the past three years, the ASTM E13.15 Subcommittee and the IUPAC Subcommittee on Electronic Data Standards have worked together to flesh out the design of AnIML. To create commonality between the data from individual analytical techniques so that key applications like generic data viewers can function, AnIML has been designed around a core schema that describes the data and their representations, a metadata schema that describes how the ancillary information about the data is represented, and a series of XML instance documents that delineate the terms used by each technique for its "scientific metadata." Instance documents instead of schemas are employed for each technique to permit extensions. Each technique document contains a standardized portion that will be balloted through the appropriate standards organization, but may be augmented by extensions added by vendors, user organizations, and end users. To ensure that the information in the instance documents is complete and valid, tools are being developed that permit both syntactic and semantic checking. AnIML has been designed to describe not only the analytical data and their associated "scientific metadata," but also to ensure the integrity of the data through the use of digital signatures and to provide the data tracking, verification, and validation necessary for use in regulated industries.
This Workshop will describe AnIML--how it works, how it is being developed, what it looks like, and how it can be used. Additional information can be found on the AnIML website: animl.sourceforge.net
- 1:30 pm A Brief Introduction to XML and AnIML GARY W KRAMER, NIST (Paper 2230-1)
- 1:50 pm Requirements for a New Analytical Data Standard MARK F BEAN, GSK (Paper 2230-2)
- 2:10 pm Architecture of the Analytical Information Markup Language (AnIML) BURKHARD A SCHAEFER, BSSN (Paper 2230-3)
- 2:40 pm Flexible Standardization with AnIML Technique Definitions MAREN FIEGE, Waters GmbH (Paper 2230-4)
- 3:10 pm The AnIML Data Model PETER J LINSTROM, NIST (Paper 2230-5)
- 3:45 pm Analytical Instrument Control Using XML-Based Web Service ALEX MUTIN, Shimadzu Scientific Instruments (Paper 2230-6)
- 4:05 pm Long Term Storage of Chromatographic Data: AnIML, TNF, Viewers, and Plenty of Challenges! MARK MULLINS, Agilent Technologies (Paper 2230-7)
- 4:35 pm AnIML in Regulated Environments MAREN FIEGE, Waters Corporation (Paper 2230-8)
- 4:55 pm The Path to the New ASTM AnIML Standard DAVID P MARTINSEN, ACS (Paper 2230-9)
- 5;15 pm Closing Comments - Gary W Kramer
Paper 2230-1: A BRIEF INTRODUCTION TO XML AND ANIML
Gary W. Kramer, National Institute of Standards and Technology, Bldg. 227; Rm. A-163 100 Bureau Drive, Gaithersburg, MD 20899-8394
XML (Extensible Markup Language) is a meta-language for describing markup languages that, in turn, are used to describe the structure and relationships between entities in a document. XML provides a standard way for marking up documents through the use of delimiting tags that label entities and create structures. Today, the term XML refers to a series of related languages and technologies for dealing with structured documents such as XSL, Extensible Style Sheet Language; XLink, XML Linking Language; XPointer, XML Pointer Language; namespaces, concepts for dealing with multiple tag sets; Dsig, algorithms for implementing digital signatures; and Xquery, XML structured query language.
Markup languages for information in specific application domains can be created using XML. The Analytical Information Markup Language (AnIML) is being created to facilitate the interchange and archiving of chromatography and molecular spectroscopy data and metadata. ASTM Subcommittee E13.15 and the IUPAC Subcommittee on Electronic Data Standards have worked together to flesh out AnIML around a core schema that describes the data and their representations, a metadata schema that details how the ancillary information about the data is represented, and a series of instance documents that delineate the terms used by each technique for its “scientific metadata.” Each technique document contains a standardized portion that will be balloted through the appropriate standards organization, but may be augmented by vendors, user organizations, and end users. To ensure that the information in AnIML documents is complete and valid, both syntactic and semantic checking tools are being developed, digital signatures are incorporated to ensure data integrity, and audit trails provide the data tracking, verification, and validation necessary for use in regulated industries.
Paper 2230-2: REQUIREMENTS FOR A NEW ANALYTICAL DATA STANDARD
Mark F. Bean, GlaxoSmithKline, Glaxosmithkline Up12-210 Po Box 5089, Collegeville, PA 19426
Data standards serve multiple functions; in the past the focus was on a format for information exchanges between data systems; more recently there is a push from the pharmaceutical sector to preserve data for long-periods (30-60 years) to meet FDA requirements, so we need data standards serving as long-term data repositories that can outlive the vendor software; finally, in the future we may hope for vendor-independent processing or viewing of analytical instrument data.
Some of the required properties that have been identified so far include: flexible; strongly-constrained; simple to understand; extensible; long-lived; not only quickly machine readable but also human readable; capable of being verified and validated; capable of handling complex analysis contexts (metadata); capable of being stored in or restored from databases; supports conversion from prior standards (especially ANDI and JCAMP); hardware, operating system, vendor, and software-independence; supports encoding raw or processed data.
One of the aspects that make the task of creating analytical information standards difficult is the constant evolution of analytical techniques. As a result, it is important that technique-constrained software must be able to read their technique sections of the standard without failing when encountering any possible extensions.
This talk will provide insight into how AnIML requirements have been gathered and offer a forum for further contributions from the audience.
Paper 2230-3: ARCHITECTURE OF THE ANALYTICAL INFORMATION MARKUP LANGUAGE (ANIML)
Burkhard A. Schaefer, BSSN, Postfach 411145, Mainz 55068, Germany
The Analytical Information Markup Language (AnIML) is a standardization effort of the E13.15 Sub-Committee of the American Society for Testing and Materials. AnIML provides an XML-based format for analytical data. It is suitable for many different analytical measurement techniques.
AnIML consists of a generic data container that permits the storage of arbitrary analytical data. This includes multi-dimensional data, name-value pairs, and hierarchies. The concept of Technique Definitions permits the formal specification of constraints for using this data container. This way, a definition can prescribe how the data for specific measurement techniques should be captured in the data file.
To address changing requirements, AnIML supports an extension concept that allows vendors or end users to specify additional data that should be stored for a measurement technique. These extensions can also be formally documented so that they do not break compatibility with existing software.
This paper will present a short introduction to AnIML and describes its architectural fundamentals. It demonstrates how AnIML can be used to record data from everyday analytical experiments in a laboratory environment. It also describes how workflows consisting of multiple experiments can be documented. In addition, AnIML features related to its application in regulated environments will be briefly mentioned. This includes digital signatures and audit trail functionality.
(Presentation not currently available)
Paper 2230-4: FLEXIBLE STANDARDIZATION WITH ANIML TECHNIQUE DEFINITIONS
Maren Fiege, Waters GmbH, Europaallee 27-29, Frechen 50226, Germany
Analytical data only becomes meaningful if put in context by information about where it came from, how it was acquired and processed, etc. Thus, it is necessary to agree upon common terms, i.e. standard data dictionaries for the diverse analytical techniques. On the other hand, there needs to be enough flexibility in a standard to accommodate new techniques and special needs. Every instrument manufacturer, software vendor, company or even user has their own parameters and pieces of information they want to store with their data. All these have to be accommodated without breaking the standard so standard applications can still read the data.
This workshop will show how AnIML meets this challenge, and how it offers both standardization and flexibility.
Paper 2230-5: THE ANIML DATA MODEL
Peter J. Linstrom, National Institute of Standards and Technology, 100 Bureau Drive Mail Stop 8380, Gaithersburg, MD 20899
ASTM Subcommittee E13.15 is developing Analytical Information Markup Language (AnIML) for the storage of analytical instrument data. The language is able to store data from a wide range of data from analytical instruments. AnIML was designed to support complex experimental designs and meet data retention requirements imposed by regulatory agencies.
AnIML uses a generic approach to data storage based on a limited number of base data types. The AnIML data model allows storage of n-dimensional data sets along with some basic metadata. A compact representation for evenly monotonic data series is provided. Base data types supported by AnIML include text format integer and text and binary format floating point data types.
This talk will discuss how AnIML stores data and illustrate the various data types supported by AnIML. Examples of applications will be provided.
Paper 2230-6: ANALYTICAL INSTRUMENT CONTROL USING XML-BASED WEB SERVICE
Alex Mutin, Shimadzu Scientific Instruments, 7102 Riverwood Drive, Columbia, MD 21046
There is a growing interest among analytical instrument users for multi-vendor support of their equipment in terms of instrument control, data acquisition and data processing capabilities.
Different vendors provide different software interfaces to control their instruments. Many users prefer to standardize on software to minimize validation and training costs, while keeping their hardware diverse. Because most laboratory software have limited multi-vendor support, often times when shopping for a new instrument users are burdened by a necessity to stay with one type of software.
XML-based web service embedded into an analytical instrument is a new technology that can potentially solve multi-vendor support limitations of current software. A web server equipped HPLC is directly connected to a computer network. Such system can be controlled from any PC without a need for any additional software except for a web browser such as the Internet Explorer. If laboratory software is linked with such web-service one can easily assemble systems out of multi-vendor hardware components while controlling them from the same application. In addition, the data can be interchanged between instruments, applications and databases using the Analytical Information Markup Language (AnIML) format.
Paper 2230-7: LONG TERM STORAGE OF CHROMATOGRAPHIC DATA: ANIML, TNF, VIEWERS, AND PLENTY OF CHALLENGES!
Mark Mullins, Agilent Technologies, 6612 Owens Drive, Pleasanton, CA 94588
Long term storage of chromatographic data is a necessity. Some data needs to be kept upwards of 100 years! In order to guarantee this data will be accessible for this extended time period, a Technology Neutral Format (TNF) must be utilized for the file format and storage of these files.
Currently, Extensible Markup Language (XML) is an ideal format for the TNF storage of files. Chromatographic data stored in XML format is technology neutral, as it can be read and understood without the original creating application. However, without some sort of standardization, every XML file will look different. With standardization, every XML file has the same format, and tools can be developed around the standard to make creating, editing, and viewing much easier. This is where Analytical Instrument Markup Language (AnIML) comes into the picture. AnIML is a standard for storing analytical data in XML.
This workshop will cover some of the challenges that are faced in creating applications to transform data into AnIML format. Topics will include isolation from changes in the XML format and AnIML viewers. Examples of actual AnIML data files and a live AnIML viewer will be presented.
Paper 2230-8: ANIML IN REGULATED ENVIRONMENTS
Antony N. Davies, Waters Corporation, Europaallee 27-29, Frechen 50226, Germany
In an electronic age where regulatory compliance and the protection of intellectual property is of ever increasing importance one of the key technologies needed is the capability of ensuring electronic data longevity.
This talk will outline how the future IUPAC/ASTM AnIML data standards will provide this longevity and meet legal requirements.
Paper 2230-9: THE PATH TO THE NEW ASTM ANIML STANDARD
David P. Martinsen, American Chemical Society, 1155 16Th Street NW, Washington, DC 20036
The Analytical Information Markup Language (AnIML) is being created within the framework of ASTM. This work is the focus of ASTM Subcommittee E13.15 on Analytical Data, a subcommittee of ASTM Committee E13 on Molecular Spectroscopy and Chromatography. The IUPAC Subcommittee on Electronic Data Standards, who are responsible for the JCAMP-DX standards, have joined with ASTM E13.15 to define this new standard for analytical information. The standard is centered around a core schema which will be used across all analytical techniques. A technique schema defines the framework for creating technique definition files for each specific analytical technique. The core and technique schemas are being created and will be maintained by ASTM E13.15. The technique definitions will require input from experts in each technique. For several common techniques (e.g., UV/Vis, IR, MS), these definition files will be created through collaboration of E13.15 with those experts. This talk will examine the standardization process, recount the work which has already taken place, and discuss the steps remaining to complete the standards process.