Various workflows require you to distribute your data to different geodatabases, and synchronize changes made to the data in each geodatabase. The following provides a guide to help you determine how best to use distributed data, geodatabase replicas, and synchronization for your system.
Introduction to distributed data provides a good starting point to understand geodatabase replication and other methods to work with distributed data. Distributed data scenarios lists a number of common use cases for which geodatabase replication can be used. If geodatabase replication seems the most appropriate method for your system, your next step is to create a replica.
Create replicas
The following will help you determine the best way to create replicas for your system:
- Determine what replicas are needed—In some cases, you may need to create only one or two replicas, while in others, many replicas are needed. For example, many replicas are needed if you are distributing data to field crews to work with on their mobile devices on-site. In cases where you want to keep two enterprise geodatabases synchronized, you may need only one replica. See, Geodatabase replication architecture to learn more about what replicas are and how they work within a geodatabase.
- Decide on the type of replication—The Replication types topic describes the available replication types. Your system may require you to use different types of replicas for various scenarios. For example, you may want to use two-way replication to synchronize with another office and one-way replication to update your map publishing geodatabase.
- Create your replicas—Use the Create Replica geoprocessing tool to create your replicas. This tool is ideal in cases where you need to create replicas on a regular basis. For example, a model can be built to create checkout replicas on a daily basis for each of your field crews.
- Integrate replication into your versioning workflows—Geodatabase replication is built on top of traditional versioning. At replica creation time, a replica version is defined in both the parent and child replicas. This is the version from which you will send and receive changes during synchronization. See the Replication and versioning topic for more information.
Since the replica version is the conduit through which changes are synchronized, plan how you will work with the replica versions before creating replicas. For example, you may plan to run some validation on the changes received during synchronization before integrating it into your main workflow. This can be done by analyzing the contents of the replica version after a synchronization and then reconciling and posting it into your regular working version. Also, the default version can be used as the replica version. This is helpful in cases where you want the changes to go directly to default when synchronizing.
- Define the data to replicate—Geodatabase replication allows you to replicate some or all of the datasets in your enterprise geodatabase. It also allows you to define what features or rows to replicate using filters and relationship classes. During creation, filters are always applied first, and then relationship classes are used to append additional features and rows. See Prepare data for replication for more information.
Consider your future needs when defining the data to replicate. For example, two-way and one-way replicas are created once and synchronized many times. The filters you define at replica creation time are also applied at synchronization time. Over time, your needs may change to require a larger replica area. It is also important to consider the type of data that you are replicating. To maintain data integrity, additional rules are applied when replicating complex data types such as topologies. The following help topics describe these rules and show examples: Topologies, Relationship Classes, Raster Data, and Terrains and Network Datasets. For additional considerations when defining data to replicate, see Replication with advanced geodatabase datasets.
- Consider replica creation options—Some options have been added to make the replica creation process as efficient as possible. These options are designed to work for specific cases and may or may not be applicable to your workflow. Review the following list to see if you can take advantage of these options:
- Re-use schema—Specify a target geodatabase that already has the schema for the data you're replicating. This saves time, since schema creation can be skipped when creating a replica. This option only applies for checkout/check-in replicas but should be used whenever possible.
- Replicate related data—During replica creation, filters are applied first, and then relationship classes are processed to determine the data to replicate. You can choose to turn off relationship class processing, which will save time. If you choose to turn off relationship class processing, the relationship classes are still included but are not processed during creation and synchronization. An option is available to turn off all relationship class processing in the Create Replica geoprocessing tool.
- Use archiving to track changes—When using archiving to track changes instead of the delta tables associated with versioning, no system versions are created. Therefore, the reconcile and post and compress processes are not affected, making version management and replication management independent. This also allows the synchronization schedule to be more flexible.
Note:
This option does not enable archiving and requires the data to be registered as traditional versioned with archiving enabled. Replicas must be created from the default version when this option is used.
Synchronize replicas
Once a replica is created, you can start synchronizing changes between the replica geodatabases. To make your system work effectively, it is important to devise a strategy for synchronizing changes. See What is synchronization to learn more.
The following should be considered when determining the best strategy for your system:
- Synchronization methods—First determine the best synchronization method for your needs. The following are some options:
- Manual synchronization—If you are only working with a small number of replicas and plan to only occasionally synchronize changes, use the Synchronize Changes geoprocessing tool in ArcGIS Pro.
- Automated synchronization using agents—In a system where there are many replicas, frequent synchronization, or both, you should consider building a replication agent. Replication agents work by automatically connecting to replicated geodatabases and performing synchronizations. In this case, end users do not have to explicitly synchronize their databases, as synchronization happens automatically.
- Synchronization using geoprocessing tools—With geoprocessing tools, you can build models to synchronize replicas using either local geodatabase connections or connections to geodata server objects running on the Internet. These models can be exported to Python scripts and executed through Python. The commands to execute the scripts can be added to scheduling software, such as the Windows scheduler, so that they can be run on a regular basis. For example, you may want to schedule a synchronization between two enterprise geodatabases once a week at an off-peak time.
- Synchronization and conflicts—If edits made to a replica's data conflict with edits being synchronized from the relative replica, you will need to determine how to resolve the conflict. A reconcile policy can be applied to automatically resolve the conflicts. Review Synchronization and versioning to see if this is a concern for your system.
- Data being synchronized—For checkout replicas, all data changes made in the child replica are synchronized. For two-way and one-way replicas, only changes that meet the requirements of the filters and relationship classes are applied. The Manage Replicas pane can be used to determine the filters and relationship class rules that have been applied to each replicated dataset.
To maintain data integrity, additional rules are applied when synchronizing complex data types such as topologies. Relationship class processing may also add to the data that gets synchronized. Review the following topics to become familiar with synchronizing different types of data: Synchronizing topology and Synchronizing related data.
Metadata for the data you choose to replicate is copied during the replica creation process. However, changes to the metadata are not applied during replica synchronization.
- Data volume—When you synchronize, only changes made since the last synchronization are applied. ArcGIS filters out any changes that have already been sent and acknowledged. Also, once a change has been sent, it is never returned to the original replica. In this way, data volumes are trimmed to just what is needed.
Plan the frequency at which you synchronize to correspond with the rate at which changes are applied to your data. If you do not synchronize frequently enough for the volume of changes, the process may be time consuming. It is also recommended that you synchronize during off-peak hours.
- The order in which replicas are synchronized—If you are working with several replicas, the order in which they are synchronized may be important. For example, consider the case where you create several two-way replicas from a single enterprise geodatabase. One strategy for synchronizing these replicas would be for each child replica to synchronize in both directions with the parent. Here the child sends changes to the parent, and the parent sends changes to the child. Another strategy is for each child replica to first send its changes to the parent. The parent incorporates all the changes and sends changes back to each child. In the first case, the parent is sending only its changes along with those received from replicas that have already been synchronized, while in the second case, it is additionally sending changes incorporated from all the other replicas. Depending on the requirements of your system, one strategy may be more appropriate than the other.
- Schema changes—Geodatabase replication is designed to allow schema changes. This means that synchronization will continue to work even if schema changes are made to the replicated data. In general, it is best to keep schema changes to a minimum.
- Working through errors—Errors can occur during the synchronization process for a number of reasons. A computer network may fail, or you may try to synchronize a replica that is in conflict. The system is designed to remain in a consistent state. Changes are rolled back, and inappropriate data changes are rejected. The replica activity log can be used to find any errors that have occurred and determine what to do, if anything, to recover. In most cases, the system will automatically recover from errors if you continue synchronizing changes. Replicas also contain generation information, which indicates how many change sets have been sent and how many have been received. See Manage Replicas for more information.