STAR   Computing Tutorials main page
Basic Database Storage Structure
Offline computing tutorial Maintained by Jeff Porter
Last modified Fri Jul 03 00:19:00 1998


Introduction

To ease maintainence of databases, a common storage structure is used where applicable. Such a structure has been devised for such database types as Conditions, Configurations, Calibrations, and Geometry. Other database types (e.g. RunLog & Production Database) require additional low level structures specific to the access requirements for the data they contain. These shall be documented independently as they are developed. The base table structures are discussed according to their function. Within each sub-section, the real table names are written with purple leters.

*comment


Schema Definition

The data objects stored in DB are pre-defined by c-structs in header files (located in $STAR/StDb/include directories). The database is fed the information about these c-structs which it stores in a pair of tables used to re-construct the c-struct definitions used for storing the data. This information can also be used to provide the memory allocation for the tables returned in the C++ API, but this can also be over-ridden to request any c-struct type allowed by the request. That is, as long as the table requested contained (at some time) items in the user-input c-struct description, the data will be return in the format requested.

To perform such operations, the two DB tables used are,

structure
schema
The structure table holds the names of all c-structs included in the database where the structure table resides and contains the fields;
name Name of the c-struct
lastSchemaID more recent schema version identifier
ID internal index for the structure used by other tables
entryTime timestamp for when structure was last modified
Comment comments associated with structre
IsBinary Y/N enumeration for whether entire c-struct is stored as binary
The schema table holds the names of all elements associated with a c-struct, thier relative order in the c-struct, and in which schema-versions each element exists. The table contains the fields,
name name of the element
type basic type of this element (e.g. float, int, ...)
storeType "ascii/bin" enumeration for this storage of this element
length array length of element (is 1 for non-arrays)
mask schema-version association
ID internal index for element used by other tables
structName named association with structure
structID index link to structure table's ID
position relative position in structure
offset offset in memory of binary stored structure( *** not used )
entryTime timestamp for when element was last modified
Comment comments associated with element
*** Will replace "position" field and add information for total binary store of structure but requires that the schema-evolution model be modified. This will be for next interation of database storage structure


Named References

There are c-structs which are used for more than one purpose. For example, Daq's detector structure is common to all types of detectors but referenced by different names when a detector specific set of data is requested or stored. Thus a request for a structure is actually a request for a named reference which in many cases is simply the name of the structure but can be different.

This information is kept in the namedRef table which contains;

name reference name for a c-struct
ID internal index for the reference name used by other tables
structID index link to structure table's ID
entryTime timestamp for when named reference was last modified
Comment comments associated with named reference


Index

Data is accessed from the database via reference contained in an index table called dataIndex. The purpose of this table is to provide access to specific rows of specific storage tables per the request information of name, timestamp, and version . Additional requests for specific element identifiers are also allowed (e.g. Sector01 of the tpc)via this index. The table contains the following fields;

count automatic internal row count on the index
nameID internal index to the namedRef table identifying data type stored
entryTime timestamp for when index row was last modified
schemaID link to the schema-version used for storing data
beginTime timestamp of beginning point of validity interval (endpoint is next relevant beginTime)
version version key for data instance
elementID sector, module, .... division of rows as needed
numRows for binary store where many rows are stored in a single mysql row
dataID index to row of the data storage table


Data Storage

There are 2 general type of storage tables used in the mysql database. The first type of storage makes use of MySQL basic storage types to store the data as specified by the schema of the c-struct. That is, each element of the c-struct is given a column in which values are recorded and can subsequently be queried against. The name of the table is the same as the name of the structure obtained from the structure table and referenced directly from the namedRef table. Within this structure, large arrays can be handled using binary storage (within mysql) by defining the storage type as being binary. Each row of each storage table contains a unique row id that is used by the dataIndex table.

example: The tpcElectronics table contains the following fields;

dataID unique row index
numberOfTimeBins
nominalGain mV/fC
samplingFrequency MHz
tZero us (microseconds)
adcCharge fC/adc count
adcConversion mV/adc count
averagePedestal adc counts
shapingTime ns
tau ns
An additional storage table is used for those table which are more easily stored in purely binary form. This is different then the note above where some "columns" of a c-struct are large arrays and can be independently (column-specific) stored in binary form. What is meant by purely binary form is that the entire c-struct (&/or many rows) is stored as a single binary blob of data. This is (to be) handled via the bytes table. The rows in the bytes table are still pointed to via the dataIndex table.

The content of the stored data is extracted using the information about how it was stored from the namedRef, schema, & structure tables. The bytes table contains the following fields;

dataID unique row index
bytes the data blobs


Configurations

The term Configurations within the Star Databases refers to ensembles of data that are linked together in a common "named" structure. The extent of a Configuration may span many Star Databases (i.e. a super-configuration) to encompass a entire production configuration but is also complete at each database. For example, there is a data ensemble for the Calibrations_tpc database that can be requested explicity but that can also be linked together with a Calibrations_svt, Calibrations_emc, ... configurations to form a Calibrations configuration.

The data ensembles are organized by a pair of tables found in each database. These tables are

Nodes
NodeRelation
The Nodes table provides a listing of directories, tables, & other-databases that represent the pieces of the accessible data ensembles. The NodeRelation table provides the connections between the Node entities.

The Nodes table contains the following fields;

Name node name
KeyName Key or version associated with this Node instance
NodeType DB, directory, Configuration, or table
baseLine Y/N for whether data is retrieveable without timestamp
elementID ID-list (e.g. sector-#s, wafer-#s,...) that Node contains
ID internal index for the reference name used by other tables
entryTime timestamp for when Node was last modified
Comment comments associated with Node
The NodeRelation table contains the following fields;
ID internal index for the reference by other tables
ParentID Node Id of a parent node
NodeID Node Id of a child node
entryTime timestamp for when relation was last modified


Catalog

The Catalog structure in the database is used to organized different configurations into understandable groupings. This is much like a folder or directory system of configurations. Though very similar to the configuration database structure it is different in use. That is, the Nodes and NodeRelation tables de-reference configurations as a unique data ensemble per selection. The Catalog is used to de-references many similar configurations into groupings associated with tasks.

The Catalog table structure has only seen limited use in Online databases and is still under development. Currently there 2 tables associated with the catalog,

CatalogNodes
CatalogNodeRelation
(I said this was very similar to the configuration structure!)

The Nodes table contains the following fields;

Name Catalog Node Name
Author Author of this CatalogNode instance
ID internal index for the reference name used by other tables
entryTime timestamp for when Node was last modified
Comment comments associated with Node
ConfigID Index to a configuration
The NodeRelation table contains the following fields;
ID internal index for the reference by other tables
ParentID CatalogNode Id of a parent node
ChildID CatalogNode Id of a child node
entryTime timestamp for when relation was last modified


comment

The Offline production chain does not use the Configurations & Catalog structures provided by the database. Rather an independent configuration is stored and retrieved via an offline table, tables_hierarchy, that is the kept within the St_db_Maker. This table contains the fields;

parnam name of parent to this row
tabnam name of table of this row
typenam type of table of this row
This table is currently stored as a binary blob in the params database.