ORACLE / GITHUB

Backing up our Oracle applications and data is something all of us should be doing, especially if we're using Oracle Always Free. It only takes an extended period of no use before your database could be irretrievably reclaimed.

I started using Oracle Always Free in early 2021and I have to admit that I gave scant regard to the need for backups, even though I was building a  substantial application comprising thousands of lines of code. And then there's the data curated over the years - dozens of articles like this plus thousands of rows of uploaded media metadata.

I have never had any problem with my OCI Always Free tenancy - however, I decided to stop working on it until I had a robust backup and recovery solution in place. 

The first rule of any successful backup solution is to store the backup files in a secure off-site location away from the data centres hosting the primary and recovery databases. 

Github offers a free service to store files in repositories but recommends that none are larger than 50MB and that the total size of all files in a repository does not exceed 1GB. While it may seem almost heretical to store binary dump files in Github, it works and is supported; moreover, the freely available Advanced Compression Option enables us to significantly compress our data pump export files before uploading to Github.

The main features of this backup/recovery solution are:

  1. Automatic secure backup of Oracle schemas to a Github repository
  2. Automatic backup of Apex applications, ORDS metadata, object and system grants, schema Packages and Table definitions to Github
  3. Automatic restore of selected files from Github to a different ADB database
  4. Automatic email detailing the backup and recovery status

One advantage of Github is that you can see at a glance the last change timestamp of your file content - it shows you when the content last changed (not when the file was last uploaded). It also shows you which content changed at any particular point in its history - you can see which lines you changed and when in a PLSQL package for example. Another advantage is that Github naturally permits code and data sharing - it's designed for that - which is particularly helpful if you're working remotely, for example. 

We can encrypt schema dump files (the Advanced Encryption option is also freely available), so these can be uploaded to a public Github repository without fear of sensitive data being stolen;  you could also separate your backup files into public and private repositories if that's your preference or keep them all in a private repository. It's this flexible nature of Github which lends itself so well to the task of maintaining a secure, full-versioned history of Oracle data off-site.

A DBMS_SCHEDULER job running every night at 9PM on my parsing schema exports the schema and Apex applications as well as ORDS metadata and system/object grants to my public Github repository https://github.com/xsf3190/oracle-to-github-backup The schema and Apex applications are then automatically restored from Github to my “restore” database. In this way, the restore database provides an instant recovery platform in case I make some terrible mistake like truncating one of my main tables. At the same time, I could easily restore the schema / applications outside of my OCI tenancy, for example to a Docker image. If you have a look at the repository you will see that the schema export and import logs are included.  The whole process takes about 2 minutes elapsed time most of which is spent exporting and importing the encrypted schema dump file.

Performing a scheduled backup and recovery say, once a day, is a highly CPU intensive activity that exercises the databases involved to the extent that the threat of an extended period of no use is avoided. 

GITHUB PERSONAL ACCESS TOKEN

To make this yourself you will need a Github personal access token (classic) at  https://github.com/settings/tokens  plus a Github repository that will receive the backup files. The personal access token is a 40 character hash code that you only see after successfully generating it. You need to make a record of it as it's needed in both the backup and recovery process. 

PROVISION RECOVERY DATABASE

You get a maximum of 2 ADB databases with your “always free” subscription. In order to properly test recovery, you could use one of them - I created a new version 21 ADB database to restore  my version 19 primary database. I can't overstate how important it is to test your recovery solution; I was happily backing up my data and applications for weeks completely unaware that I had a bug in the code which was only backing up the first 12K of each object!

BACKUP PROCESS

There are dozens of different ways that this could have been designed. Rather than go through the process in detail, I'll highlight some of the key elements, especially those that I struggled with in case it helps anyone needing a similar solution. You can copy the code in file “PACKAGE.PCK_BACKUP” or import the schema dump file “EXPORT_SCHEMA.EXAMPLE.METADATA.dmp”.

Each ADB database comes with the fully privileged ADMIN user, which will generate a schema export by default including the DDL to recreate the schema with it's exported password in the resulting dump file; in order to ensure the process is least-privileged I built the backup process in the Apex parsing schema itself. By contrast, the recovery process described later runs as ADMIN and imports the dump file in to a newly provisioned “no authentication” schema of the same name.

The name of the parsing schema being exported is EXAMPLE and it needs READ, WRITE privileges on the DATA_PUMP_DIR that is pre-defined in each Oracle OCI ADB; it also needs execute privilege on the DBMS_CLOUD package. Connect as ADMIN and run -

GRANT READ,WRITE ON DIRECTORY DATA_PUMP_DIR TO EXAMPLE;
GRANT EXECUTE ON DBMS_CLOUD TO EXAMPLE;

To create a schema export of EXAMPLE from a scheduled job we have to use the DBMS_DATAPUMP supplied package. You can't run external scripts in ADB from PLSQL.

The following procedure is in the package PCK_BACKUP - which you can see as file “PACKAGE.PCK_BACKUP” in my Github repository - this is the version of the code backed up at 9pm GMT every day. From now on all of these referenced objects are compiled in the EXAMPLE schema.

    PROCEDURE datapump_backup(pIncludeRows IN VARCHAR2, pPassword IN VARCHAR2 DEFAULT NULL) IS
        h1 NUMBER;
        l_schema VARCHAR2(30):=sys_context('userenv','current_schema');
        l_status varchar2(4000);
    BEGIN
        log('Starting datapump export job for Schema:'||l_schema || ' Include Rows:' || pIncludeRows);
        h1 := dbms_datapump.OPEN (operation => 'EXPORT', job_mode => 'SCHEMA', job_name => 'EXPORT_SCHEMA_JOB', version => 'COMPATIBLE'); 

        dbms_datapump.set_parameter(handle => h1, name => 'COMPRESSION', VALUE => 'ALL'); 
        dbms_datapump.set_parallel(handle => h1, degree => 1); 
        dbms_datapump.add_file(handle => h1, filename => l_schema || CASE WHEN pIncludeRows = '0' THEN '.METADATA' END || '.dmp', 
            directory => 'DATA_PUMP_DIR', filesize => '50M',  filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_DUMP_FILE, reusefile => 1); 
        dbms_datapump.add_file(handle => h1, filename => l_schema || CASE WHEN pIncludeRows = '0' THEN '.METADATA' END || '.log', 
            directory => 'DATA_PUMP_DIR', filetype => DBMS_DATAPUMP.KU$_FILE_TYPE_LOG_FILE, reusefile => 1); 
        dbms_datapump.set_parameter(handle => h1, name => 'KEEP_MASTER', VALUE => 0);
        dbms_datapump.metadata_filter(handle => h1, name => 'SCHEMA_EXPR', VALUE => 'IN(''' || l_schema || ''')');
        dbms_datapump.metadata_filter(handle => h1, name => 'EXCLUDE_PATH_EXPR', VALUE => 'IN (''STATISTICS'',''CLUSTER'',''DB_LINK'',''INDEXTYPE'',''PROCOBJ'',''JOB'',''SCHEDULER'')'); 
        
        IF (pIncludeRows='0') THEN
            dbms_datapump.data_filter(handle => h1, name => 'INCLUDE_ROWS', VALUE => 0);
        ELSE
            dbms_datapump.set_parameter(handle => h1, name => 'ENCRYPTION_MODE', VALUE => 'PASSWORD'); 
            dbms_datapump.set_parameter(handle => h1, name => 'ENCRYPTION_PASSWORD', VALUE => pPassword); 
        END IF;
        
        dbms_datapump.start_job(handle => h1, skip_current => 0, abort_step => 0); 
        dbms_datapump.wait_for_job( handle => h1, job_state => l_status);
        dbms_datapump.detach(handle => h1); 
        log('Completed datapump export job. Final status:' || l_status);
        
    EXCEPTION
        WHEN OTHERS THEN
            log_error(sqlcode,sqlerrm);
            RAISE;
    END;

Note that two exports are catered for - if the parameter “pIncludeRows” is set to “0” then we produce an unencrypted metadata only dump file; if not set to “0” the dump file includes table rows by default and is encrypted with a supplied password. Elsewhere in the code the password is generated on each invocation with 16 random upper/lower/numeric/special characters.

Following Oracle best practices, we do not export STATISTICS, nor do we export DB links, clusters and indextype objects. Very importantly, we also want to exclude SCHEDULER jobs as these would start running in the recovery database after we successfully perform a restore (yes, I did indeed make this error).

OCI “always free” is extremely generous - not only do you get substantial storage, memory and CPU capacity, but you also get all the software options free of charge - and you certainly wouldn't want to pay for the Advanced Security Option that enables us to encrypt our export dump files.

Once the schema exports are complete we can send these together with their log files plus Apex applications, ORDS metadata, object and system grants to our Github repository.