Introduction#
Extract basic provenance information from VOTable header. The information is described in DataOrigin IVOA note: https://www.ivoa.net/documents/DataOrigin/.
DataOrigin includes both the query information (such as publisher, contact, versions, etc.) and the Dataset origin (such as Creator, bibliographic links, URL, etc.)
This API retrieves Metadata from INFO in VOTable.
Getting Started#
For the following example, we would first reconstruct a VOTable DataOrigin based on a query to VizieR catalogue J/AJ/167/18. In practice, you would obtain this table directly from the VO service of interest:
>>> from astropy.io.votable.dataorigin import add_data_origin_info
>>> from astropy.io.votable.tree import VOTableFile
>>> from astropy.table import Column, Table
>>> # For this example, the table data itself is irrelevant.
>>> table = Table([
... Column(name="id", data=[1, 2, 3, 4]),
... Column(name="bmag", unit="mag", data=[5.6, 7.9, 12.4, 11.3])])
>>> votable = VOTableFile().from_table(table)
>>> votable.description = "Period variations of 32 contact binaries (Hong+, 2024)"
>>> # Order is important here for the example.
>>> add_data_origin_info(votable, "ivoid", "ivo://cds.vizier/j/aj/167/18",
... content="IVOID of underlying data collection")
>>> add_data_origin_info(votable, "creator", "Hong K.",
... content="First author or institution")
>>> add_data_origin_info(votable, "cites", "bibcode:2024AJ....167...18H",
... content="Article or Data origin sources")
>>> add_data_origin_info(votable, "editor", "Astronomical Journal (AAS)",
... content="Editor name (article)")
>>> add_data_origin_info(votable, "original_date", "2024",
... content="Year of the article publication")
>>> # The rest in alphabetical order.
>>> add_data_origin_info(votable, "citation", "doi:10.26093/cds/vizier.51670018")
>>> add_data_origin_info(votable, "contact", "cds-question@unistra.fr")
>>> add_data_origin_info(votable, "publication_date", "2024-11-06")
>>> add_data_origin_info(votable, "publisher", "CDS")
>>> add_data_origin_info(votable, "reference_url", "https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18")
>>> add_data_origin_info(votable, "request_date", "2025-03-05T14:18:05")
>>> add_data_origin_info(votable, "rights_uri", "https://cds.unistra.fr/vizier-org/licences_vizier.html")
>>> add_data_origin_info(votable, "server_software", "7.4.5")
>>> add_data_origin_info(votable, "service_protocol", "ivo://ivoa.net/std/ConeSearch/v1.03")
To extract DataOrigin from VOTable:
>>> from astropy.io.votable.dataorigin import extract_data_origin
>>> data_origin = extract_data_origin(votable)
>>> print(data_origin)
publisher: CDS
server_software: 7.4.5
service_protocol: ivo://ivoa.net/std/ConeSearch/v1.03
request_date: 2025-03-05T14:18:05
contact: cds-question@unistra.fr
ivoid: ivo://cds.vizier/j/aj/167/18
citation: doi:10.26093/cds/vizier.51670018
reference_url: https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18
rights_uri: https://cds.unistra.fr/vizier-org/licences_vizier.html
creator: Hong K.
editor: Astronomical Journal (AAS)
cites: bibcode:2024AJ....167...18H
original_date: 2024
publication_date: 2024-11-06
Contents and metadata#
astropy.io.votable.dataorigin.extract_data_origin returns a astropy.io.votable.dataorigin.DataOrigin (class) container which is made of:
a
astropy.io.votable.dataorigin.QueryOrigin(class) container describing the request.QueryOriginis considered to be unique for the whole VOTable. It includes metadata like the publisher, the contact, date of execution, query, etc.a list of
astropy.io.votable.dataorigin.DatasetOrigin(class) container for each Element having DataOrigin information.DataSetOriginis a basic provenance of the datasets queried. Each attribute is a list. It includes metadata like authors, ivoid, landing pages, ….
Examples#
Get the (Data Center) publisher and the Creator of the dataset:
>>> print(data_origin.query.publisher)
CDS
>>> print(data_origin.origin[0].creator)
['Hong K.']
Other capabilities#
DataOrigin container includes VO Elements:
Extract list of
astropy.io.votable.tree.Info:>>> # get DataOrigin with the description of each INFO >>> for dataset_origin in data_origin.origin: ... for info in dataset_origin.infos: ... print(f"{info.name}: {info.value} ({info.content})") ivoid: ivo://cds.vizier/j/aj/167/18 (IVOID of underlying data collection) creator: Hong K. (First author or institution) cites: bibcode:2024AJ....167...18H (Article or Data origin sources) editor: Astronomical Journal (AAS) (Editor name (article)) original_date: 2024 (Year of the article publication) ...
Extract tree node
astropy.io.votable.tree.Element; The following example extracts the citation from the header (in APA style):>>> # get the Title retrieved in Element >>> origin = data_origin.origin[0] >>> vo_elt = origin.get_votable_element() >>> title = vo_elt.description if vo_elt else "" >>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}") APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018
Add Data Origin INFO into VOTable:
>>> from astropy.io.votable import dataorigin >>> dataorigin.add_data_origin_info(votable, "query", "Data center name") >>> dataorigin.add_data_origin_info(votable.resources[0], "creator", "Author name")