VOTable Handling (astropy.io.votable)#
Introduction#
The astropy.io.votable sub-package converts VOTable XML files to and
from numpy record arrays.
Note
If you want to read or write a single table in VOTable format, the recommended method is via the High-level Unified File I/O interface. In particular, see the Unified I/O VO Tables section.
Getting Started#
Reading a VOTable File#
To read a VOTable file, pass a file path to
parse:
from astropy.io.votable import parse
votable = parse("votable.xml")
votable is a VOTableFile object, which
can be used to retrieve and manipulate the data and save it back out
to disk.
Writing a VOTable File#
This section describes writing table data in the VOTable format using the
votable package directly. For some cases, however, the high-level
High-level Unified File I/O will often suffice and is somewhat more convenient to use. See
the Unified I/O VOTable section for details.
To save a VOTable file, call the
to_xml method. It accepts
either a string or Unicode path, or a Python file-like object:
votable.to_xml('output.xml')
There are a number of data storage formats supported by
astropy.io.votable. The TABLEDATA format is XML-based and
stores values as strings representing numbers. The BINARY format
is more compact, and stores numbers in base64-encoded binary. VOTable
version 1.3 adds the BINARY2 format, which allows for masking of
any data type, including integers and bit fields which cannot be
masked in the older BINARY format. The storage format can be set
on a per-table basis using the format
attribute, or globally using the
set_all_tables_format method:
votable.get_first_table().format = 'binary'
votable.set_all_tables_format('binary')
votable.to_xml('binary.xml')
The VOTable elements#
VOTables are built from nested elements. Let’s for example build a
votable containing an INFO element:
>>> from astropy.io.votable.tree import VOTableFile, Info
>>> vot = VOTableFile()
>>> vot.infos.append(Info(name="date_obs", value="2025-01-01"))
These elements can be:
Here are some detailed explanations on some of these elements:
Using astropy.io.votable#
Standard Compliance#
astropy.io.votable.tree.TableElement supports the VOTable Format Definition
Version 1.1,
Version 1.2,
Version 1.3,
Version 1.4,
and Version 1.5,
Some flexibility is provided to support the 1.0 draft version and
other nonstandard usage in the wild, see Verifying VOTables for more
details.
Note
Each warning and VOTABLE-specific exception emitted has a number and is documented in more detail in Warnings and Exceptions.
Output always conforms to the 1.1, 1.2, 1.3, 1.4, 1.5 spec, depending on the input.
Verifying VOTables#
Many VOTable files in the wild do not conform to the VOTable specification. You
can set what should happen when a violation is encountered with the verify
keyword, which can take three values:
'ignore'- Attempt to parse the VOTable silently. This is the default setting.
'warn'- Attempt to parse the VOTable, but raise appropriate Warnings. It is possible to limit the number of warnings of the same type to a maximum value using theastropy.io.votable.exceptions.conf.max_warningsitem in the Configuration System (astropy.config).
'exception'- Do not parse the VOTable and raise an exception.
The verify keyword can be used with the parse()
or parse_single_table() functions:
from astropy.io.votable import parse
votable = parse("votable.xml", verify='warn')
It is possible to change the default verify value through the
astropy.io.votable.conf.verify item in the
Configuration System (astropy.config).
Note that 'ignore' or 'warn' mean that astropy will attempt to
parse the VOTable, but if the specification has been violated then success
cannot be guaranteed.
It is good practice to report any errors to the author of the application that generated the VOTable file to bring the file into compliance with the specification.
Data Serialization Formats#
VOTable supports a number of different serialization formats.
TABLEDATA stores the data in pure XML, where the numerical values are written as human-readable strings.
BINARY is a binary representation of the data, stored in the XML as an opaque
base64-encoded blob.BINARY2 was added in VOTable 1.3, and is identical to “BINARY”, except that it explicitly records the position of missing values rather than identifying them by a special value.
FITS stores the data in an external FITS file. This serialization is not supported by the
astropy.io.votablewriter, since it requires writing multiple files.PARQUETstores the data in an external PARQUET file, similar to FITS serialization. Reading and writing is fully supported by theastropy.io.votablewriter and theastropy.io.votable.parsereader. The parquet file can be referenced with either absolute and relative paths. The parquet serialization can be used as part of the unified Table I/O (see next section), by setting theformatargument to'votable.parquet'.
The serialization format can be selected in two ways:
1) By setting the
formatattribute of aastropy.io.votable.tree.TableElementobject:votable.get_first_table().format = "binary" votable.to_xml("new_votable.xml")2) By overriding the format of all tables using the
tabledata_formatkeyword argument when writing out a VOTable file:votable.to_xml("new_votable.xml", tabledata_format="binary")
Converting to/from an astropy.table.Table#
The VOTable standard does not map conceptually to an
astropy.table.Table. However, a single table within the VOTable
file may be converted to and from an astropy.table.Table:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml").to_table()
As a convenience, there is also a function to create an entire VOTable file with just a single table:
from astropy.io.votable import from_table, writeto
votable = from_table(table)
writeto(votable, "output.xml")
Note
By default, to_table will use the ID attribute from the files to
create the column names for the Table object. However,
it may be that you want to use the name attributes instead. For this,
set the use_names_over_ids keyword to True. Note that since field
names are not guaranteed to be unique in the VOTable specification,
but column names are required to be unique in numpy structured arrays (and
thus astropy.table.Table objects), the names may be renamed by appending
numbers to the end in some cases.
Performance Considerations#
File reads will be moderately faster if the TABLE element includes
an nrows attribute. If the number of rows is not specified, the
record array must be resized repeatedly during load.
Data Origin#
Introduction#
Extract basic provenance information from VOTable header. The information is described in DataOrigin IVOA note: https://www.ivoa.net/documents/DataOrigin/.
DataOrigin includes both the query information (such as publisher, contact, versions, etc.) and the Dataset origin (such as Creator, bibliographic links, URL, etc.)
This API retrieves Metadata from INFO in VOTable.
Getting Started#
For the following example, we would first reconstruct a VOTable DataOrigin based on a query to VizieR catalogue J/AJ/167/18. In practice, you would obtain this table directly from the VO service of interest:
>>> from astropy.io.votable.dataorigin import add_data_origin_info
>>> from astropy.io.votable.tree import VOTableFile
>>> from astropy.table import Column, Table
>>> # For this example, the table data itself is irrelevant.
>>> table = Table([
... Column(name="id", data=[1, 2, 3, 4]),
... Column(name="bmag", unit="mag", data=[5.6, 7.9, 12.4, 11.3])])
>>> votable = VOTableFile().from_table(table)
>>> votable.description = "Period variations of 32 contact binaries (Hong+, 2024)"
>>> # Order is important here for the example.
>>> add_data_origin_info(votable, "ivoid", "ivo://cds.vizier/j/aj/167/18",
... content="IVOID of underlying data collection")
>>> add_data_origin_info(votable, "creator", "Hong K.",
... content="First author or institution")
>>> add_data_origin_info(votable, "cites", "bibcode:2024AJ....167...18H",
... content="Article or Data origin sources")
>>> add_data_origin_info(votable, "editor", "Astronomical Journal (AAS)",
... content="Editor name (article)")
>>> add_data_origin_info(votable, "original_date", "2024",
... content="Year of the article publication")
>>> # The rest in alphabetical order.
>>> add_data_origin_info(votable, "citation", "doi:10.26093/cds/vizier.51670018")
>>> add_data_origin_info(votable, "contact", "cds-question@unistra.fr")
>>> add_data_origin_info(votable, "publication_date", "2024-11-06")
>>> add_data_origin_info(votable, "publisher", "CDS")
>>> add_data_origin_info(votable, "reference_url", "https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18")
>>> add_data_origin_info(votable, "request_date", "2025-03-05T14:18:05")
>>> add_data_origin_info(votable, "rights_uri", "https://cds.unistra.fr/vizier-org/licences_vizier.html")
>>> add_data_origin_info(votable, "server_software", "7.4.5")
>>> add_data_origin_info(votable, "service_protocol", "ivo://ivoa.net/std/ConeSearch/v1.03")
To extract DataOrigin from VOTable:
>>> from astropy.io.votable.dataorigin import extract_data_origin
>>> data_origin = extract_data_origin(votable)
>>> print(data_origin)
publisher: CDS
server_software: 7.4.5
service_protocol: ivo://ivoa.net/std/ConeSearch/v1.03
request_date: 2025-03-05T14:18:05
contact: cds-question@unistra.fr
ivoid: ivo://cds.vizier/j/aj/167/18
citation: doi:10.26093/cds/vizier.51670018
reference_url: https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18
rights_uri: https://cds.unistra.fr/vizier-org/licences_vizier.html
creator: Hong K.
editor: Astronomical Journal (AAS)
cites: bibcode:2024AJ....167...18H
original_date: 2024
publication_date: 2024-11-06
Contents and metadata#
astropy.io.votable.dataorigin.extract_data_origin returns a astropy.io.votable.dataorigin.DataOrigin (class) container which is made of:
a
astropy.io.votable.dataorigin.QueryOrigin(class) container describing the request.QueryOriginis considered to be unique for the whole VOTable. It includes metadata like the publisher, the contact, date of execution, query, etc.a list of
astropy.io.votable.dataorigin.DatasetOrigin(class) container for each Element having DataOrigin information.DataSetOriginis a basic provenance of the datasets queried. Each attribute is a list. It includes metadata like authors, ivoid, landing pages, ….
Examples#
Get the (Data Center) publisher and the Creator of the dataset:
>>> print(data_origin.query.publisher)
CDS
>>> print(data_origin.origin[0].creator)
['Hong K.']
Other capabilities#
DataOrigin container includes VO Elements:
Extract list of
astropy.io.votable.tree.Info:>>> # get DataOrigin with the description of each INFO >>> for dataset_origin in data_origin.origin: ... for info in dataset_origin.infos: ... print(f"{info.name}: {info.value} ({info.content})") ivoid: ivo://cds.vizier/j/aj/167/18 (IVOID of underlying data collection) creator: Hong K. (First author or institution) cites: bibcode:2024AJ....167...18H (Article or Data origin sources) editor: Astronomical Journal (AAS) (Editor name (article)) original_date: 2024 (Year of the article publication) ...
Extract tree node
astropy.io.votable.tree.Element; The following example extracts the citation from the header (in APA style):>>> # get the Title retrieved in Element >>> origin = data_origin.origin[0] >>> vo_elt = origin.get_votable_element() >>> title = vo_elt.description if vo_elt else "" >>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}") APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018
Add Data Origin INFO into VOTable:
>>> from astropy.io.votable import dataorigin >>> dataorigin.add_data_origin_info(votable, "query", "Data center name") >>> dataorigin.add_data_origin_info(votable.resources[0], "creator", "Author name")