Geodatabase replication fundamentals

Various workflows require you to distribute your data to different geodatabases and synchronize changes made to the data in each geodatabase. The following provides a guide to help you determine how best to use distributed data, geodatabase replicas, and synchronization for your system.

Introduction to distributed data provides a good starting point to understand geodatabase replication and other methods to work with distributed data. Distributed data scenarios lists a number of common use cases for which you can use geodatabase replication. If geodatabase replication is the most appropriate method for your system, your next step is to create a replica.

Create replicas

The following can help you determine the best way to create replicas for your system:

  • Determine which replicas are needed—In some cases, you may need to create only one or two replicas, while in others, many replicas are needed. For example, many replicas are needed if you are distributing data to field crews to work with on their mobile devices on-site. In cases where you want to keep two enterprise geodatabases synchronized, you may need only one replica. See Geodatabase replication architecture to learn more about replicas and how they work in a geodatabase.
  • Decide on the type of replication—The Replication types topic describes the available replication types. Your system may require you to use different types of replicas for various scenarios. For example, you may want to use two-way replication to synchronize with another office and one-way replication to update your map publishing geodatabase.
  • Create your replicas—Use the Create Replica geoprocessing tool to create your replicas. This tool is ideal if you need to create replicas on a regular basis. For example, you can build a model to create checkout replicas on a daily basis for each of your field crews.
  • Integrate replication into your versioning workflows—Geodatabase replication is built on top of traditional versioning. At replica creation time, a replica version is defined in both the parent and child replicas. This is the version from which you will send and receive changes during synchronization. See the Replication and versioning topic for more information.

    Since the replica version is the conduit through which changes are synchronized, plan how you will work with the replica versions before creating replicas. For example, you may plan to run some validation on the changes received during synchronization before integrating it into your main workflow. You can analyze the contents of the replica version after a synchronization and reconcile and post it into your regular working version. Also, you can use the default version as the replica version. This is helpful in cases where you want the changes to go directly to default when synchronizing.

  • Define the data to replicate—Geodatabase replication allows you to replicate some or all of the datasets in your enterprise geodatabase. It also allows you to define what features or rows to replicate using filters and relationship classes. During creation, filters are always applied first, and then relationship classes are used to append additional features and rows. See Prepare data for replication for more information.

    Consider your future needs when defining the data to replicate. For example, two-way and one-way replicas are created once and synchronized many times. The filters you define at replica creation time are also applied at synchronization time. Over time, your needs may change to require a larger replica area. It is also important to consider the type of data that you are replicating. To maintain data integrity, additional rules are applied when replicating complex data types such as topologies. The following help topics describe these rules and show examples: Topology in ArcGIS, Relationships and ArcGIS, Imagery and raster in ArcGIS Pro, Terrain dataset in ArcGIS Pro, and What is a network dataset. For additional considerations when defining data to replicate, see Replication with advanced geodatabase datasets.

  • Consider replica creation options—Some options have been added to make the replica creation process as efficient as possible. These options are designed to work for specific cases and may or may not be applicable to your workflow. Review the following list to see if you can take advantage of these options:
    • Re-use schema—Specify a target geodatabase that already has the schema for the data you're replicating. This saves time, since you can skip schema creation when creating a replica. This option only applies for checkout and check-in replicas but it is recommended that you use it whenever possible.
    • Replicate related data—During replica creation, filters are applied first, and then relationship classes are processed to determine the data to replicate. You can choose to turn off relationship class processing, which saves time. If you choose to turn off relationship class processing, the relationship classes are still included but are not processed during creation and synchronization. An option is available to turn off all relationship class processing in the Create Replica geoprocessing tool.
    • Use archiving to track changes—When using archiving to track changes instead of the delta tables associated with versioning, no system versions are created. Therefore, the reconcile and post and compress processes are not affected, making version management and replication management independent. This also allows the synchronization schedule to be more flexible.
      Note:

      This option does not enable archiving and requires you to register the data as traditional versioned with archiving enabled. You must create replicas from the default version when you use this option.

    • Register existing data only—If you are replicating a very large amount of data, you may want to consider using the Register existing data only option. This option allows you to bypass the data copying step of replica creation and register a new replica. To use this option successfully, you must complete a specific set of requirements prior to replica creation.

Synchronize replicas

Once a replica is created, you can start synchronizing changes between the replica geodatabases. To make your system work effectively, it is important to devise a strategy for synchronizing changes. See What is synchronization to learn more.

Consider the following when determining the best strategy for your system:

  • Synchronization methods—First determine the best synchronization method for your needs. The following are some options:
    • Manual synchronization—If you are only working with a small number of replicas and plan to only occasionally synchronize changes, use the Synchronize Changes geoprocessing tool in ArcGIS Pro.
    • Automated synchronization using agents—In a system where there are many replicas, frequent synchronization, or both, consider building a replication agent. Replication agents work by automatically connecting to replicated geodatabases and performing synchronizations. In this case, end users do not have to explicitly synchronize their databases, as synchronization happens automatically.
      • Synchronization using geoprocessing tools—With geoprocessing tools, you can build models to synchronize replicas using either local geodatabase connections or connections to geodata server objects running on the internet. You can export these models to Python scripts and execute them through Python. You can add the commands to execute the scripts to scheduling software, such as the Windows scheduler, so that you can run them on a regular basis. For example, you may want to schedule a synchronization between two enterprise geodatabases once a week at an off-peak time.
  • Synchronization and conflicts—If edits made to a replica's data conflict with edits being synchronized from the relative replica, you must determine how to resolve the conflict. You can apply a reconcile policy to automatically resolve the conflicts. Review Synchronization and versioning to see if this is a concern for your system.
  • Data being synchronized—For checkout replicas, all data changes made in the child replica are synchronized. For two-way and one-way replicas, only changes that meet the requirements of the filters and relationship classes are applied. You can use the Manage Replicas pane to determine the filters and relationship class rules that have been applied to each replicated dataset. To maintain data integrity, additional rules are applied when synchronizing complex data types such as topologies. Relationship class processing may also add to the data that gets synchronized. Review the following topics to become familiar with synchronizing different types of data: Synchronizing topology and Synchronizing related data.

    Metadata for the data you choose to replicate is copied during the replica creation process. However, changes to the metadata are not applied during replica synchronization.

  • Data volume—When you synchronize, only changes made since the last synchronization are applied. ArcGIS Pro filters out any changes that have already been sent and acknowledged. Also, once a change has been sent, it is never returned to the original replica. In this way, data volumes are trimmed to just what is needed.

    Plan the frequency at which you synchronize to correspond with the rate at which changes are applied to your data. If you do not synchronize frequently enough for the volume of changes, the process may be time consuming. It is also recommended that you synchronize during off-peak hours.

  • The order in which replicas are synchronized—If you are working with several replicas, the order in which they are synchronized may be important. For example, consider the case where you create several two-way replicas from a single enterprise geodatabase. One strategy for synchronizing these replicas is for each child replica to synchronize in both directions with the parent. Here the child sends changes to the parent, and the parent sends changes to the child. Another strategy is for each child replica to first send its changes to the parent. The parent incorporates all the changes and sends changes back to each child. In the first case, the parent is sending only its changes along with those received from replicas that have already been synchronized, while in the second case, it is additionally sending changes incorporated from all the other replicas. Depending on the requirements of your system, one strategy may be more appropriate than the other.
  • Schema changes—Geodatabase replication is designed to allow schema changes. This means that synchronization continues to work even if schema changes are made to the replicated data. In general, it is best to keep schema changes to a minimum.
  • Working through errors—Errors can occur during the synchronization process for a number of reasons. A computer network may fail, or you may try to synchronize a replica that is in conflict. The system is designed to remain in a consistent state. Changes are rolled back, and inappropriate data changes are rejected. You can use the replica activity log to find any errors that have occurred and determine what to do, if anything, to recover. In most cases, the system automatically recovers from errors if you continue synchronizing changes. Replicas also contain generation information, which indicates how many change sets have been sent and how many have been received. See A quick tour of replica management for more information.

Related topics