Sahana Eden

Synchronization

The Synchronization module allows the synchronization of data resources between Sahana Eden instances. Synchronization jobs can be configured to be run automatically in the background and at regular intervals, without disrupting the current operation of the sites.

This module is part of the site administration module, and requires administrator privileges to view or modify its configuration. The synchronization module requires web2py revision 3566 (1.99.0) or newer.

Overview

The synchronization process is controlled entirely by the "active" Sahana Eden instance (master instance).

The active Eden instance runs the scheduler process, and initiates the update requests when they are due, while the passive repository (slave instance) merely responds to these requests.



The active Eden instance first downloads the available updates from the passive repository (pull) and imports them into the local database, and then uploads all available updates from the local database to the passive repository (push).

Both pull and push are each a RESTful HTTP-request, using S3XML as data format.

Synchronization Homepage

Login as administrator and open the Administration menu. In the left menu, you will then find the following entries:



Click on Synchronization here to open the homepage of the Synchronization Module:


Configuration

Follow this checklist to configure synchronization:

  1. Check the Prerequisites
  2. Make sure the passive site is up and running, and reachable over the network
  3. Login as administrator at the active site and
     1. Configure the default proxy server in Synchronization Settings as needed
     2. Register the passive site in Repository Configuration
     3. Configure the resources to synchronize in Resource Configuration
     4. Set up the Synchronization Schedule
  4. Ensure you have a Worker process running at the active site

Prerequisites

Both sites must have Sahana Eden installed and running. To avoid problems with different database structures, both Sahana Eden instances should always use the same version of the software.

Important: It is important that the system clocks in both sites are synchronized with each other, which can best be achieved by synchronizing both sites with the same NTP service.

Decide which one is the active and which one is the passive instance. The passive instance is typically a permanently and publicly accessible Sahana Eden instance, while the active instance could be a protected Eden installation (e.g. behind a firewall), or one with only temporary network access (e.g. on a notebook).

While performing synchronization jobs, the ''active'' site must be able to establish a connection to the ''passive'' site over the network using HTTP (or HTTPS).

If a proxy server has to be used for the HTTP connection, this can be configured in the Synchronization Settings (proxy authentication is currently not supported).

Check that both instances have the synchronization module enabled in the private/templates/<templatename>/config.py file. If the sync section is missing from the settings.modules dict, then add it as follows:

settings.modules = OrderedDict([
    ...
    # Add or uncomment this section, if it is missing or commented:
    ("sync", Storage(
            name_nice = T("Synchronization"),
            description = T("Synchronization"),
            restricted = True,
            access = "|1|",     # Only Administrators can see this module in the default menu & access the controller
            module_type = 0     # This item is handled separately for the menu
        )),
    ...
])

Synchronization Settings

Go to the Synchronization Homepage and click Settings to open this page:



This page shows you the UUID (universally unique identifier) of the repository you are logged in at. You will need this identifier to register the repository at a peer site. The UUID is created during the first run of a Sahana Eden instance, and cannot be changed.

If needed, enter the complete URL of the proxy server (including port number if not 80) that is to be used when connecting to the passive site (this is only necessary at the active site). Click Save to update the configuration.

Repository Configuration

Go to the Synchronization Homepage and click Repositories. This will show you a list of all currently configured repositories:



To view and/or modify the configuration for a repository, click the Open button in the respective row in the list.

By clicking Add Repository, you can register a new repository:



Fill in the fields as follows:

Field  Instructions at the active site at the passive site
Name Enter a name for the repository(for your own reference) required  required
URL Enter the URL of the repository (base URL of the Sahana Eden instance, e.g. http://www.example.org/eden) required  
Username Enter the username to authenticate at the repository  required  
Password Enter the password to authenticate at the repository  required
Proxy Server Enter the URL of a proxy server to connect to the repository, if different from the Synchronization Settings  fill in as needed  
Accept Pushes check this if the repository is allowed to push updates    set as needed
UUID Enter the UUID from the Synchronization Settings of the repository  required  required

Resource Configuration

Go to the Synchronization Homepage, click Repositories, then Open the repository you want to configure a resource for, and change to the Resources tab:



Fill in the fields as follows:

Field Instructions Example
Resource Name Fill in the name of the master table of the resource. Details can be found in the documentation for the data model of your Sahana Eden application req_req
Mode Select the synchronization mode you wish to activate - pull, push or both. See Method Overview to understand the mode pull and push
Strategy Choose the import methods you wish to allow for the synchronization of this resource create, update, delete
Update Policy Choose in which situation records shall be updated, see explanations below NEWER
Conflict Policy Choose in which situation records shall be updated in case of conflicts, see explanations below NEWER

Update Policy

If a record has been modified in one of the repositories, then the synchronization process has to decide whether to update the other repository with the new data or not. For this decision you can define a policy:

Policy Meaning
THIS Always update the remote repository with the local version of the record (overwrite remote updates)
NEWER Update both repositories to the newest version of the record (keep the newer data)
MASTER Update the record on either side only if the other side has originated the record (keep the master data)
OTHER Always update the local repository with the remote version of the record (overwrite local updates)

Usually, you would choose "NEWER" here unless you have a good reason to do otherwise.

Conflict Policy

If a record has been modified both in the local repository and the remote repository since the last synchronization time, then this is called a conflict situation, in which two concurrent record updates are available at the same time. You can define a policy for which of the updates to apply, similar to the Update Policy.

If you do not know what to select here, it is reasonable to choose the same option as for the Update Policy.

Policy Transfer

In most situations, you would want both repositories to apply the same policies. This is the default behavior - the policies from the active site are reported to the passive site during the synchronization, and are applied there as well (THIS and OTHER are replaced by the respective opposite at the passive site, of course).

If for some reason you need to define different policies at the passive site, then you have to configure the same resource at the passive site as well, and choose the policies explicitly.

Synchronization Schedule

Go to the Synchronization Homepage, click Repositories, then Open the repository configuration you want to schedule a synchronization job for and change to the Schedule tab. If there are already jobs configured for this repository, you will see a list of those jobs. Otherwise (or by clicking Add Job), you get to this form:



With every Job, all resources configured for this repository will be synchronized.

Fill in the fields as follows:

 Field  Instructions  Example
 Enabled  Set to True if the job shall actually be run, or set False to disable the job  True
 Start Time  Select date and time for the first run of this job (UTC)  2011-09-21 08:30
 End Time  Select date and time after which the job shall not be run anymore (UTC)  2012-09-21 08:30
 Repeat n times  Select how often the job shall be run, set to 0 to set no limit  0
 Run every  Select the time interval after which to repeat the job  5 minutes
 Timeout  Set a maximum time after which to abort the action  600 seconds

If you need to switch between jobs (e.g. for maintenance periods, low-traffic periods), you can set up multiple schedules, and disable/enable them as needed.

To consider:

You should choose meaningful time interval and timeout settings: the more resources are to be synchronized, the longer it will take (in this regard, also note that THIS- and OTHER-policies will always exchange all records in a resource, thus taking significantly longer).

How many records have to be exchanged per run depends on the average update frequency and the time interval between synchronizations: e.g. if there are on average 100 record updates per minute, and you set a 2-minute interval, then there would be 200 records on average to be transmitted every run. The import rate on a small server has been tested at on average 18 records/second, which means, the synchronization process would take around 11 seconds in this case. To be on the safe side, choose a timeout value at least 10 times as high as that - e.g. 120 seconds.

Note that the network traffic arising from synchronization does not mainly depend on the frequency of synchronization, but on the record update rate at the sites. Smaller synchronization intervals would increase the traffic only slightly, but reduce the rate of conflicts and the risk of network-related problems. However, too small intervals (below the update rate of the site) may cause unnecessary network traffic with just empty transmissions.

Worker

The scheduled synchronization jobs are performed by a separate asynchronous web2py worker process at the active site. Make sure the worker process for the Scheduler is running at the active site, see chapter on Scheduler.

Synchronization Log

Go to the Synchronization Homepage and click Log. This shows you a list of all prior log entries for all repositories.

If you instead want to see the log entries only for a particular repository, go to the Synchronization Homepage, click Repositories, then Open the respective repository configuration and go to the Log tab:

Note: the newest entries are shown on top of the list.

Click on Details for a log entry to see the complete entry:



Read the entries as follows:
Item Explanation
Date/Time Date and time of the transaction
Repository Name of the repository synchronized with
Resource Name Name of the resource synchronized
Mode Transaction mode (pull or push) and direction of transmission (incoming or outgoing)
Action Action performed to resolve problems (if any)
Result Result of the transaction
Remote Error Was this error at this site or at the repository synchronized with?
Message The log message