Table element#
VOTable files can contain RESOURCE elements, each of
which may contain one or more TABLE elements. The TABLE
elements contain the arrays of data.
To get at the TABLE elements, you can write a loop over the
resources in the VOTABLE file:
for resource in votable.resources:
for table in resource.tables:
# ... do something with the table ...
pass
However, if the nested structure of the resources is not important,
you can use iter_tables to
return a flat list of all tables:
for table in votable.iter_tables():
# ... do something with the table ...
pass
Finally, if you expect only one table in the file, it might be most convenient
to use get_first_table:
table = votable.get_first_table()
Alternatively, there is a convenience method to parse a VOTable file and return the first table all in one step:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml")
From a TableElement object, you can get the data itself
in the array member variable:
data = table.array
This data is a numpy record array.
The columns get their names from both the ID and name
attributes of the FIELD elements in the VOTABLE file.
Suppose we had a FIELD specified as follows:
<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN"
unit="deg">
<DESCRIPTION>
representing the ICRS declination of the center of the image.
</DESCRIPTION>
</FIELD>
Note
The mapping from VOTable name and ID attributes to numpy
dtype names and titles is highly confusing.
In VOTable, ID is guaranteed to be unique, but is not
required. name is not guaranteed to be unique, but is
required.
In numpy record dtypes, names are required to be unique and
are required. titles are not required, and are not required
to be unique.
Therefore, VOTable’s ID most closely maps to numpy’s
names, and VOTable’s name most closely maps to numpy’s
titles. However, in some cases where a VOTable ID is not
provided, a numpy name will be generated based on the VOTable
name. Unfortunately, VOTable fields do not have an attribute
that is both unique and required, which would be the most
convenient mechanism to uniquely identify a column.
When converting from an astropy.io.votable.tree.TableElement object to
an astropy.table.Table object, you can specify whether to give
preference to name or ID attributes when naming the
columns. By default, ID is given preference. To give
name preference, pass the keyword argument
use_names_over_ids=True:
>>> votable.get_first_table().to_table(use_names_over_ids=True)
This column of data can be extracted from the record array using:
>>> table.array['dec_targ']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
or equivalently:
>>> table.array['Dec']
array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826,
17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136,
17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055,
17.1553884541, 17.15539736932, 17.15539752176,
17.25736014763,
# ...
17.2765703], dtype=object)
Datatype Mappings#
The datatype specified by a FIELD element is mapped to a numpy
type according to the following table:
VOTABLE type
NumPy type
boolean
b1
bit
b1
unsignedByte
u1
char (variable length)
O - A
bytes()object.char (fixed length)
S
unicodeChar (variable length)
O - A
strobjectunicodeChar (fixed length)
U
short
i2
int
i4
long
i8
float
f4
double
f8
floatComplex
c8
doubleComplex
c16
If the field is a fixed-size array, the data is stored as a numpy
fixed-size array.
If the field is a variable-size array (that is, arraysize contains
a ‘*’), the cell will contain a Python list of numpy values. Each
value may be either an array or scalar depending on the arraysize
specifier.
Examining Field Types#
To look up more information about a field in a table, you can use the
get_field_by_id method, which returns
the Field object with the given ID.
To look up more information about a field:
>>> field = table.get_field_by_id('Dec')
>>> field.datatype
'char'
>>> field.unit
'deg'
Note
Field descriptors should not be mutated. To change the set of
columns, convert the Table to an astropy.table.Table, make the
changes, and then convert it back.
Building a New Table from Scratch#
It is also possible to build a new table, define some field datatypes, and populate it with data.
To build a new table from a VOTable file:
from astropy.io.votable.tree import VOTableFile, Resource, TableElement, Field
# Create a new VOTable file...
votable = VOTableFile()
# ...with one resource...
resource = Resource()
votable.resources.append(resource)
# ... with one table
table = TableElement(votable)
resource.tables.append(table)
# Define some fields
table.fields.extend([
Field(votable, name="filename", datatype="char", arraysize="*"),
Field(votable, name="matrix", datatype="double", arraysize="2x2")])
# Now, use those field definitions to create the numpy record arrays, with
# the given number of rows
table.create_arrays(2)
# Now table.array can be filled with data
table.array[0] = ('test1.xml', [[1, 0], [0, 1]])
table.array[1] = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]])
# Now write the whole thing to a file.
# Note, we have to use the top-level votable file object
votable.to_xml("new_votable.xml")
Missing Values#
Any value in the table may be “missing”. astropy.io.votable stores
a numpy masked array in each TableElement
instance. This behaves like an ordinary numpy masked array, except
for variable-length fields. For those fields, the datatype of the
column is “object” and another numpy masked array is stored there.
Therefore, operations on variable-length columns will not work — this
is because variable-length columns are not directly supported
by numpy masked arrays.