the FYSOS registry system

The FYSOS registry system specification 1.0.0-rc1

This document describes version 1.0.0-rc1 of the FYSOS registry system: a free, simple, portable, personal, fully featured registry system for embedded tasks and hobbyists alike. Minor changes made to this document (e.g. wording) that do not affect the registry system format are tracked by the third number in the document version number.

This registry system is in the release development stage: this document supersedes any previous version of the registry system specification with no care for backward compatibility.

Since this is a new registry system, and some aspects are still to be defined, suggestions or corrections are welcome, for either the registry system or this document. Please contact the author at: fys at fysnet.net. The author wishes to thank those who have submitted comments and criticism in order to improve this system.

This document contains the formal specification of the FYSOS registry in-memory/on-disk format. For an overview of this system and for downloads, please see the overview page.

Table of contents

Differences from the previous versions

Definitions

The following terms and conventions will be used throughout this specification:

Structure identification and checksum

The structure of this registry system includes fields to make the system more robust.

Sensitive structures, such as the Base Structure, store a checksum field in the first few bytes of the structure. The checksum is computed on all data within the registry block, not counting the checksum field itself. The definition and technique to calculate this checksum is defined by the official CRC-32 standard. The checksum must be recomputed at least every time a sensitive structure or data area is modified and released to the system.

The following functions show how to calculate the checksum. The data parameter is a pointer to the data area to be checked. The size parameter is the size in bytes of the structure pointed to by data. A driver must initialize the crc32_table once before calling any of the remaining routines. A call to crc32_initialize() may be used.

Please note that the checksum field itself must not be included in the checksum calculation. Initially setting this field to zero will allow it to be a part of the check.
/* Predefined polynomial */
#define CRC32_POLYNOMIAL 0x04C11DB7

/* Lookup table. Must be pre-initialized. */
uint32_t crc32_table[256];

/* Initialize table.
 *  no parameters
 */
void crc32_initialize(void) {
  // 256 values representing ASCII character codes.
  for (int i=0; i<256; i++) {
    crc32_table[i] = crc32_reflect(i, 8) << 24;
    
    for (int j=0; j<8; j++)
      crc32_table[i] = (crc32_table[i] << 1) ^ ((crc32_table[i] & (1 << 31)) ? CRC32_POLYNOMIAL : 0);
    
    crc32_table[i] = crc32_reflect(crc32_table[i], 32);
  }
}

/* Reflection:
 * reflect = current value to process
 * ch = size in bits of value
 * (Reflection is a requirement for the official CRC-32 standard.
 *  You can create CRCs without it, but they won't conform to the standard.)
 */
uint32_t crc32_reflect(uint32_t reflect, char ch) {
  uint32_t ret = 0;
  
  // Swap bit 0 for bit ch-1, bit 1 For bit ch-2, etc....
  for (int i=1; i<(ch + 1); i++) {
    if (reflect & 1)
      ret |= 1 << (ch - i);
    reflect >>= 1;
  }
  
  return ret;
}

/* Compute the checksum of an area.
 * data -> data area to be checked.
 * len = count in bytes of area to check.
 */
uint32_t crc32(void *data, uint32_t len) {
  uint32_t crc = 0xFFFFFFFF;
  crc32_partial(&crc, data, len);
  return (crc ^ 0xFFFFFFFF);
}

/* Compute the checksum of a partial area.
 * crc -> running checksum value.
 * ptr -> data area to be checked.
 * len = count in bytes of area to check.
 */
void crc32_partial(uint32_t *crc, void *ptr, uint32_t len) {
  uint8_t *data = (uint8_t *) ptr;
  while (len--)
    *crc = (*crc >> 8) ^ crc32_table[(*crc & 0xFF) ^ *data++];
}

Some structures also contain one or more magic fields storing a 32-bit constant signature identifying the structure. This can be used as a first test to validate a sensitive structure.

Hive Layout

 Figure 1: Layout of a FYSOS registry system.

Layout of the FYSOS registry system

The layout of a registry is shown in Figure 1, with the three minimally required structures shown in Generation 0: The Base Structure, the base hive (with a name of System), and the Base End Structure.

A registry starts with a Base Structure allowing to store information about the registry so that it may be written to a media device for storage, as well as other information needed. It then contains a single hive capable of containing many child hives and cells. This main hive must have the case sensitive name of System. Following this hive, ending the registry, a single Base End Structure is used to indicate the end of the data.

To allow a hierarchy of hives to be stored within the registry, any hive may contain child hives, each in turn containing children themselves, up to a depth of 256 generations.

Each hive may also contain an arbitrary number of cells, these cells containing the desired data to store within the registry. A cell must not contain child hives or cells.

A delimited character string is used to transverse through the hive generations, ultimately pointing to a single cell.

For example, if an application wants to save a flag indicating if it has been initialized, it could use the following path: /System/Kernel/ApplicationName/Setup/Initialized

When sent to the registry driver, this path would be used to retrieve the TRUE value from the example shown in Figure 1.

Each name within the delimited path is a generation of hives each generation allowing an arbitrary amount of hives and cells to be stored.

The use of names for hives and cells

Throughout the registry, a name is used to indicate a hive or a cell. For example, a limb on the tree (a generation) will need a name, used as a parent. Each cell (and optionally any hives) within this child generation will need a name as well. A name is stored within the hive and cell structures. To keep it simple, these structures contain a fixed number of bytes used to store this name. This name is stored using the UTF-8 format and must be null terminated.

Since a path uses the '/' character as a delimiter, all characters except for this forward slash are allowed within a name. Names are case-sensitive. For example, the names "ApplicationName" and "applicationname" are two different names and both may appear in a hive.

It is up to the registry driver to make sure that no two identical named hives and/or cells are included within the same generation.

The Base Structure

The format of the Base Structure is the following:

struct RegistryBase
uint32_t magicThis must be equal to 0x42415345 (the 'BASE' characters in ASCII), and it must be used to identify a valid registry system.
uint32_t checksumThe checksum value for the whole registry. All bytes from the start of this structure to and including the RegisterEnd structure are included in the calculation.
uint32_t versionThis field identifies the version of the registry system, and it is provided for future development. The high word identifies the major version number and the low word the minor version number (for example 0x0120 would mean version 1.32). At present, it must be set to 0x0100 (that is version 1.0) and drivers must not try to access an unknown system version, backward compatibility making no sense.
uint32_t paddingThis field is reserved and must be preserved.
uint64_t sizeThis field is the size of the allocated memory used to hold this registry. It is only valid while the registry is loaded into memory. This field is considered reserved and preserved when written to a media device.
uint64_t lengthThis is the count of 8-bit bytes used to hold the registry. i.e.: this is the current size of the registry from the start of this structure, through and including the RegistryEnd structure. This field must remain valid both in memory and on media.
uint64_t lastModifiedThis is the timestamp of the last time this registry was modified. This field must hold the microseconds from an epoc of 1 Jan 2000, 00:00:00. (Does not include leap seconds)
uint64_t reservedThis field is reserved and must be preserved.

There is a marker at the end of the registry to simply ensure the integrity of the registry. Its format is shown below.

The Registry Base End Structure

The format of the Registry Base End Structure is the following:

struct RegistryBaseEnd
uint32_t magicThis must be equal to 0x45534142 (the 'ESAB' characters in ASCII), and it must be used to identify the end of a valid registry system.

The Hive Structure

Between the RegistryBase and RegistryBaseEnd structures, there is a single hive. This hive is the base hive, must be the only hive in this first generation, and must have the name of "System". This hive may and usually does contain child hives and cells.

A hive contains a starting tag, a name, a depth value, enough room to store any child hives and/or cells, and an ending tag.

The format of the Hive Structure is the following:

struct Hive
uint32_t startingTagThis must be equal to 0x48495645 (the 'HIVE' characters in ASCII), and it must be used to identify the start of a hive.
uint8_t name[32]This is the name of the hive. It is stored in UTF-8 format and must be null terminated.
uint32_t depthThis is the generational depth of the hive. For example, if this is the base hive, it will have a value of zero. A child hive will have a value of 1. A grandchild hive will have a value of 2. This is to help keep the registry intact and used for robustness. A maximum hive depth number of 255 must be observed (256 max generations).
uint32_t reservedReserved and preserved. (Future plans: May be used for permissions and other flags.)
child hives and/or cells
uint32_t endingTagThis must be equal to 0x45564948 (the 'EVIH' characters in ASCII), and it must be used to identify the end of a hive.

A hive must occupy 12 dwords (48 bytes), not counting the dwords used to store the child hives and cells.

The Cell Structure

A cell is used to store the desired information. A cell must not contain any children.

A cell contains a starting tag, a name, a data type, a data length, enough room to store the data, and an ending tag.

The format of the Cell Structure is the following:

struct Cell
uint32_t startingTagThis must be equal to 0x43454C4C (the 'CELL' characters in ASCII), and it must be used to identify the start of a cell.
uint8_t name[32]This is the name of the cell. It is stored in UTF-8 format and must be null terminated.
uint32_t typeThis is the type of data stored. See enum dataType below.
uint32_t lengthThis is the length, in 32-bit dwords, of the data stored. Must be a value of 0 to 16384 inclusively.
uint32_t data[length]This is the data stored.
uint32_t endingTagThis must be equal to 0x4C4C4543 (the 'LLEC' characters in ASCII), and it must be used to identify the end of a cell.

So that the length of a cell is always a multiple of a 32-bit dword, the data member is a count of 32-bit dwords. If the length of the data stored is less than a multiple of sizeof(dword), the trailing bytes must be zeros.

A cell must occupy 12 dwords (48 bytes), not counting the dwords used to store the data.

There are eight types of data allowed and listed below.

enum dataType
Symbolic nameValueDescription
dtExist0No data. Simply an existing empty cell used as a marker. Drivers must return a boolean value showing the existence of this cell.
The length field must be zero.
No data is stored. The cell's data[] field is non-existent.
dtBoolean1Data type of Boolean. A zero value indicates FALSE. Any non-zero value indicates TRUE. It is recommended, but not required, that all non-zero values be the value of 0x00000001.
The length field must be 1.
Data stored in little-endain format. example: 00 00 00 00 or 01 00 00 00
dtInteger2Data type of integer. A 32-bit signed integer in the range of -2,147,483,648 to 2,147,483,647 inclusively.
The length field must be 1.
Data stored in little-endain format.
dtUnsigned3Data type of unsigned integer. A 32-bit unsigned integer in the range of 0 to 0xFFFFFFFF inclusively.
The length field must be 1.
Data stored in little-endain format.
dtIntegerLong4Data type of long integer. A 64-bit signed long integer in the range of -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 inclusively.
The length field must be 2.
Data stored in little-endain format: low dword first, high dword last.
dtUnsignedLong5Data type of unsigned long integer. A 64-bit unsigned long integer in the range of 0 to 0xFFFFFFFFFFFFFFFF inclusively.
The length field must be 2.
Data stored in little-endain format: low dword first, high dword last.
dtString6Data type of a character string. A string of UTF-8 characters and must be null terminated.
The length field must be (utf8_strlen(string) + utf8_strlen('\0') + sizeof(dword) - 1) / sizeof(dword).
Data stored as consecutive bytes.
dtBinary7Data type of binary. A string of 8-bit bytes.
The length field must be (length_of_data + sizeof(dword) - 1) / sizeof(dword).
Data stored as consecutive bytes.

An example registry

Here is an example registry, complete with the Base, base hive (with children), and the Ending Base tag.

Hive Example

Figure 2: Example of a complete registry.

Requirements

The following is a list of notes and/or requirements.

End of the FYSOS Registry System specification