Migration and notes from V01.265 to V01.275

This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.

Reasoning for this upgrade and core of the upgrade

One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :

  1. reshape of the database (leaving the old field), deployment of the database API in cross mode support
  2. run the normalization scripts filling the new table and fields, deployment of the final API and drop of the obsolete columns (+ index rebuild)

The steps are more carefully described below ...

Step by step migration instructions

Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).

This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).

We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.

Steps in Phase I

  1. (0) Create the following tables
      DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths
      (
        filePathID                    BIGINT         NOT NULL         AUTO_INCREMENT,
        filePathName                  VARCHAR(255)   NOT NULL         UNIQUE,
        filePathCount                 INT,
        PRIMARY KEY                   (filePathID)
      ) TYPE=MyISAM;
    
      DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts 
     (
        hostID      smallint(6) NOT NULL auto_increment,
        hostName    varchar(30) NOT NULL default 'localhost',
        hostIDate   timestamp(14) NOT NULL,
        hostCreator varchar(15) NOT NULL default 'unknown',
        hostCount   int(11) default NULL,
        hostComment text,
        PRIMARY KEY (hostID),
        UNIQUE KEY  hostName (hostName)
      ) TYPE=MyISAM;
    
    
  2. Modify some table and recreate one
         
         ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`;
         ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`;
         UPDATE TABLE `FileLocations` SET hostID=0;
    
         # note that I did that one from the Web interface (TBC)
         INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); 
    
         ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` )  
    
         ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ;
         ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ;
         ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`;
    
         ALTER TABLE `StorageSites` ADD  `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ;
         ALTER TABLE `StorageSites` ADD  `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ;
         ALTER TABLE `StorageSites` DROP `storageComment`;
         ALTER TABLE `StorageSites` ADD  `storageSiteComment` TEXT AFTER `storageSiteCount`;
    
         ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ;
         ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ;
    
    
         ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ;
         ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ;
         ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`;
    
    
         ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCount`   INT AFTER `triggerSetupCreator`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT  AFTER `triggerSetupCount`;
    
         ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCount`   INT AFTER `eventGeneratorCreator`;
    
         ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ;
         ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ;
    
         ALTER TABLE `ProductionConditions` DROP `productionComments`; 
         ALTER TABLE `ProductionConditions` ADD  `productionConditionIDate`   TIMESTAMP NOT NULL AFTER `libraryVersion`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionComment` TEXT AFTER `productionConditionCount`;
    
    
    
         #
         # This table was not shaped as a dictionary so needs to be re-created
         # Hopefully, was not filled prior (but will be this year)
         #
         DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords
         (
            triggerWordID           MEDIUMINT       NOT NULL        AUTO_INCREMENT,
            triggerWordName         VARCHAR(50)     NOT NULL,
            triggerWordVersion      CHAR(6)         NOT NULL DEFAULT "V0.0",
            triggerWordBits         CHAR(6)         NOT NULL,  
            triggerWordIDate        TIMESTAMP       NOT NULL,
            triggerWordCreator      CHAR(15)        DEFAULT 'unknown' NOT NULL,
            triggerWordCount        INT,
            triggerWordComment      TEXT,
            UNIQUE   TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits),
            PRIMARY KEY             (triggerWordID)
         ) TYPE=MyISAM;
  3. Deploy the new API CVS version 1.62 of FileCatalog.pm

  4. Run the following utility scripts

    util/path_convert.pl
    util/host_convert.pl

    Note that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that

    use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.

  5. Check that Hosts table was filled properly and automatically with Creator/IDate
  6. Paranoia step : Re-run the scripts mentioned 2 steps ago

    At this stage and ideally, nothing should happen (as you have already modified the records).
    A few tips prior from doing that If it does return anything, contact me for further investigation and database repairs. As a side note, the -as keyword was introduced recently and you should update your get_file_list.pl script if not available.
  7. Make a backup copy of the database for security (optional but safer) Backup can be done by easer a dump of mysql or more trivially, a cp -r of the database directory.
  8. Leave it running for a few days (should be fine) for confidence consolidation ;-)

You are ready for phase II. Hang on tight now ...

Steps in Phase II

Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...

  1. Stop all daemons, be sure that during the rest of the operations, NO command attempts to manipulate the database. If you want to shield your users from the upgrade, stop all Master/slave relations.
  2. Connect to the master FileCatalog as administrator for that database and execute the following SQL commands
      > ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID);
      > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID);
    
      # drop the columns not in use anymore / should also get rid of the associated
      # indexes.
      > ALTER TABLE `FileLocations` DROP COLUMN nodeName;
      > ALTER TABLE `FileLocations` DROP COLUMN filePath;
    
      # "rename" index / was created with a name difference to avoid clash for transition
      # now renamed for consistency
      > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX  FL_FilePathIndex (filePathID);
  3. OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).



A few notes