Updating and fixing data sources

There are numerous reasons why data sources need to be repaired or redirected to different locations. The Catalog View in ArcGIS Pro has capabilities for updating data sources. However, making these changes manually in every affected map or project can be overwhelming. Methods are available with the arcpy.mp scripting environment that make it possible to automate these changes without having to open a project. You have control of updating data sources for individual layers or tables, or you can update all the layers or tables in a common workspace at once.

The following members are used with changing data source workflows:

Using the updateConnectionProperties function

The updateConnectionProperties function can be thought of as a find-and-replace function with which you replace the current_connection_info parameter with the new_connection_info parameter. These parameters can be either a full path to a workspace, a partial string, a dictionary that contains connection properties, a partial dictionary that defines specific keys, or a path to a database connection (.sde) file.

Sugerencia:

When updating data sources for enterprise geodatabase layers, database connection files can be used in the current_connection_info and new_connection_info parameters—such as the following:

aprx.updateConnectionProperties(r'C:\DBConnections\TestGDB.sde', 
                                r'C:\DBConnections\ProductionGDB.sde')

The auto_update_joins_and_relates property allows you to control whether joins and relates associated with a layer or table should be updated. The default is set to True. There may be times, especially when updating all data sources at the project level, that you do not want these associated sources to be updated. If that is the case, set this parameter to False.

By default, the updateConnectionProperties method only updates a data source if the new_connection_info is a valid data source. If the validate parameter is set to False, the data source is set to that location regardless of whether it exists. This can be useful for scenarios that require data sources to be updated ahead of the data being created. In these cases, the data appears broken in the associated maps.

To change a layer's dataset to a feature class with a different name, see the Using the connectionProperties dictionary section below. To change the feature dataset that a layer's feature class resides in, see the Updating data sources via the CIM section below.

Here are several examples of using the updateConnectionProperties function:

  1. The following script changes the full path to a file geodatabase data source for all layers and tables in a project. In this example, a folder was renamed and all vector data was moved to this new location:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties(r'C:\Projects\YosemiteNP\Data\Yosemite.gdb',
                                    r'C:\Projects\YosemiteNP\Vector_Data\Yosemite.gdb')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  2. The following example is very similar to the one above but uses partial path strings to replace the data sources. Be sure when using a partial string that it doesn't occur multiple times in a path. You may not get the results you would expect.

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties('Data','Vector_Data')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  3. The following example replaces a personal geodatabase connection with a file geodatabase connection using a partial path for all layers and tables in a map:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    m = aprx.listMaps("Yose*")[0]
    m.updateConnectionProperties('Background.mdb', 'Background_fGDB.gdb')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  4. The following example references a layer in a map and uses those connection properties to update the connection properties for the same layer in a layer file that has not been updated with the new data source:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    m = aprx.listMaps('Yose*')[0]
    lyr = m.listLayers('Ranger Stations')[0]
    lyrFile = arcpy.mp.LayerFile(r'C:\Projects\YosemiteNP\LYRXs\Yosemite\OperationalLayers.lyrx')
    
    for l in lyrFile.listLayers():
      if l.name == 'Ranger Stations':
        l.updateConnectionProperties(l.connectionProperties, lyr.connectionProperties)
    
    lyrFile.save()

Here are several examples of updating the data sources for enterprise geodatabase layers:

  1. The following example replaces a file geodatabase connection with a path to an enterprise geodatabase connection (.sde) file for all layers and tables in a project:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties(r'C:\Projects\YosemiteNP\Vector_Data\Yosemite.gdb',
                                    r'C:\Projects\YosemiteNP\DBConnections\Server.sde')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  2. The following example replaces the connection properties from an enterprise geodatabase connection file in the current_connection_info parameter with a new enterprise geodatabase connection file in the new_connection_info parameter:

    Nota:

    The enterprise geodatabase connection file specified in the current_connection_info parameter does not need to be the actual connection file used to create the layer. Rather, the connection properties contained with the connection file will be used in the updateConnectionProperties find-and-replace functionality.

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties(r'C:\Projects\YosemiteNP\DBConnections\TestGDB.sde',
                                    r'C:\Projects\YosemiteNP\DBConnections\ProductionGDB.sde')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  3. The following example replaces an enterprise geodatabase connection file with a path to a file geodatabase for all layers and tables in a project:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties(r'C:\Projects\YosemiteNP\DBConnections\Server.sde',
                                    r'C:\Projects\YosemiteNP\Local_Data\YosemiteLocal.gdb')
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")

Sugerencia:
If no matches are found when you replace the current_connection_info parameter with the new_connection_info parameter in the updateConnectionProperties function, your script may complete, but nothing will get updated.

Using the connectionProperties dictionary

Using connectionProperties for updating data sources requires that you work with a dictionary of connection properties. The dictionary that is returned varies depending on whether it is a file-based workspace or a database connection, or if the layer or table has associated joins or relates. It is because of this variability that it is important to understand the different types of connection properties and how to navigate the dictionaries to make the appropriate changes. For example, a layer with a join or a relate returns a very different result than the same layer without a join or a relate. The approach to updating connection property dictionaries is to reference and retrieve the dictionary from a layer or table, make the necessary changes to it, then set the modified dictionary back to the layer or table you want to update using the updateConnectionProperties method.

A good way to display the dictionary structure is to use the Python pprint function.

import arcpy, pprint
p = arcpy.mp.ArcGISProject('current')
m = p.listMaps()[0]
l = m.listLayers()[0]
pprint.pprint(l.connectionProperties)

For example, a file-based data source with no joins or relates will look like the following:

{'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\Yosemite.gdb'}, 
 'dataset': 'RangerStations', 
 'workspace_factory': 'File Geodatabase'}

The above example is the most basic structure. A dictionary with three keys is returned. The value for the connection_info key is another dictionary that contains a path to the database.

One of the most common use cases for using the connectionProperties dictionary is changing a layer's dataset to a feature class with a different name. Here are several examples:

  1. The following example updates the data source's dataset name from RangerStations to RangerStationsNew. It also updates the geodatabase from Yosemite.gdb to YosemiteNew.gdb.

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    lyr = aprx.listMaps("Main*").listLayers("Ranger Stations")[0]
    find_dict = {'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\Yosemite.gdb'}, 
                 'dataset': 'RangerStations', 
                 'workspace_factory': 'File Geodatabase'}
    replace_dict = {'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\YosemiteNew.gdb'}, 
                    'dataset': 'RangerStationsNew', 
                    'workspace_factory': 'File Geodatabase'}
    lyr.updateConnectionProperties(find_dict, replace_dict)
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  2. The script above can also be rewritten such as the following:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    lyr = aprx.listMaps("Main*").listLayers("Ranger Stations")[0]
    cp = lyr.connectionProperties
    cp['connection_info']['database'] = 'C:\\Projects\\YosemiteNP\\Data\\YosemiteNew.gdb'
    cp['dataset'] = 'RangerStationsNew'
    lyr.updateConnectionProperties(lyr.connectionProperties, cp)
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  3. A partial dictionary can also be used in the updateConnectionProperties method, as can been seen in the example below.

    The following example updates the data source's dataset name from PtsInterest to PointsOfInterest for layers in a project. This example doesn't change the geodatabase in which the feature class resides. Rather, it updates the layers to point to a different feature class in the same geodatabase:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    aprx.updateConnectionProperties({'dataset': 'PtsInterest'}, {'dataset': 'PointsOfInterest'})
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  4. The connectionProperties dictionary can also be used to update file-based data sources, such as shapefiles, raster files, and so on. The following example changes the data source of a layer to point to a new shapefile in a different folder.

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    lyr = aprx.listMaps('Main*').listLayers('RoadsShp')[0]
    cp = lyr.connectionProperties
    cp['connection_info']['database'] = 'C:\\Projects\\YosemiteNP\\Data_New'
    cp['dataset'] = 'NewRoads.shp'
    lyr.updateConnectionProperties(lyr.connectionProperties, cp)
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")

Using the connectionProperties dictionary with enterprise geodatabase data

Below is an example of an enterprise geodatabase data source connectionProperties dictionary for a layer:

{'connection_info': {'authentication_mode': 'OSA',                     
                     'database': 'TestDB',                     
                     'db_connection_properties': 'TestServer',                     
                     'dbclient': 'sqlserver',                     
                     'instance': 'sde:sqlserver:TestServer',                     
                     'password': '*********',                     
                     'server': 'TestServer',                     
                     'user': 'User',                     
                     'version': 'sde.DEFAULT'}, 
'dataset': 'TestDB.USER.RangerStations', 
'workspace_factory': 'SDE'}

The same three keys are returned as a file geodatabase layer, but this time the connection_info value is a dictionary with a larger set of database connection properties. Any of these properties can be modified.

The following example changes the enterprise geodatabase instance and server for all layers in a project. In this example, the enterprise geodatabase uses operating system authentication, and the database name is the same. If the user names and passwords are the same, the instance and server can be changed without knowing the credentials of layers in the project and without having to create new enterprise geodatabase connection files.

import arcpy
aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
find_dict = {'connection_info': {'db_connection_properties': 'TestServer',
                                 'instance': 'sde:sqlserver:TestServer',
                                 'server': 'TestServer'}}
replace_dict = {'connection_info': {'db_connection_properties': 'ProdServer',
                                    'instance': 'sde:sqlserver:ProdServer',
                                    'server': 'ProdServer'}}
aprx.updateConnectionProperties(find_dict, replace_dict)
aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")

Using the connectionProperties dictionary with joins

The connectionProperties dictionary will also show the properties of any joins that are present on the layer. Any of these properties can be modified.

The example below shows a file-based data source connectionProperties dictionary with one join:

{'cardinality': 'one_to_many',
 'destination': {'connection_info': {'database': 'C:\\Projects\\FGDB.gdb'},
                 'dataset': 'tabular_eco',
                 'workspace_factory': 'File Geodatabase'},
 'foreign_key': 'ECO_CODE',
 'join_forward': False,
 'join_type': 'left_outer_join',
 'primary_key': 'CODE',
 'source': {'connection_info': {'database': 'C:\\Projects\\FGDB.gdb'},
            'dataset': 'mex_eco',
            'workspace_factory': 'File Geodatabase'}}

This example shows a file-based data source connectionProperties dictionary with two joins:

{'cardinality': 'one_to_many', 
 'destination': {'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\BackgroundData.gdb'},                 
                 'dataset': 'census2000',                 
                 'workspace_factory': 'File Geodatabase'}, 
 'foreign_key': 'State_Polygons.State_Name', 
 'join_forward': False, 
 'join_type': 'left_outer_join',
 'primary_key': 'STATE_NAME', 
 'source': {'cardinality': 'one_to_many',
            'destination': {'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\BackgroundData.gdb'},
                            'dataset': 'census2010',                            
                            'workspace_factory': 'File Geodatabase'},
            'foreign_key': 'State_Name',            
            'join_forward': False,            
            'join_type': 'left_outer_join',            
            'primary_key': 'STATE_NAME',            
            'source': {'connection_info': {'database': 'C:\\Projects\\YosemiteNP\\Data\\BackgroundData.gdb'},
                       'dataset': 'State_Polygons',                       
                       'workspace_factory': 'File Geodatabase'}}}

When joins are associated with a layer or table, the connectionProperties dictionary structure changes. You no longer have the same three root level keys, as you saw in previous examples. To understand why this is different, you need to understand how joins are persisted. Joins are nested. For example, if table one and table two are joined to a layer, table one is joined to the layer and table two is joined to the combination of the layer and table one. The root level dictionary describes the second join first. From the second join's source, you can trace the connection to the original layer and table one.

Here are several examples of using the connectionProperties dictionary with joins:

  1. The following example modifies the foreign key of a join for a specific layer:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\Mexico\MexicoEcology.aprx')
    mexLyr = aprx.listMaps('Layers')[0].listLayers('mex_eco')[0]
    conProps = mexLyr.connectionProperties
    conProps['foreign_key'] = 'ECO_CODE_NEW'
    mexLyr.updateConnectionProperties(mexLyr.connectionProperties, conProps)
    aprx.saveACopy(r"C:\Projects\Mexico\MexicoEcologyNew.aprx")
  2. A partial dictionary can also be used in the updateConnectionProperties method. The following example modifies the join properties for all layers in the project that use the specified foreign key:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\Mexico\MexicoEcology.aprx')
    aprx.updateConnectionProperties({'foreign_key': 'ECO_CODE'}, {'foreign_key': 'ECO_CODE_NEW'})
    aprx.saveACopy(r"C:\Projects\Mexico\MexicoEcologyNew.aprx")
  3. The following example modifies the source database and dataset for the primary layer both tables are joined to without changing the connection information for the joins:

    import arcpy
    aprx = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    lyr = aprx.listMaps("Main*").listLayers("State_Polygons")[0]
    conProp = lyr.connectionProperties
    conProp['source']['source']['connection_info']['database'] = 'C:\\Projects\\YosemiteNP\\Vector_Data\\Census.gdb'
    conProp['source']['source']['dataset'] = 'States'
    lyr.updateConnectionProperties(lyr.connectionProperties, conProp)
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  4. The following example will create a join inventory of all the joins on all the layers in a map. This example employs Python recursive function logic to handle layers that have no joins or any number of joins:

    import arcpy
    
    def ListJoinsConProp(cp, join_count=0):
        if 'source' in cp:
            if 'destination' in cp:
                print(' '*6, 'Join Properties:')
                print(' '*9, cp['destination']['connection_info'])
                print(' '*9, cp['destination']['dataset'])
                join_count += 1
                return ListJoinsConProp(cp['source'], join_count)
        else:
            if join_count == 0:
                print(' '*6, '- no join')
    
    aprx = arcpy.mp.ArcGISProject(r"C:\Projects\Mexico\MexicoEcology.aprx")
    m = aprx.listMaps()[0]
    for lyr in m.listLayers():
        print(f"LAYER: {lyr.name}")
        if lyr.supports("dataSource"):
            cp = lyr.connectionProperties
            if cp is not None:
                ListJoinsConProp(cp)
  5. The following example will display the connection properties of layers in the map. Similar to the example above, this example also employs Python recursive function logic to handle layers that have no joins or any number of joins:

    import arcpy
    
    def ConPropsWithJoins(cp):
        if 'source' in cp:
            return ConPropsWithJoins(cp['source'])
        else:
            print(' '*6, 'database:', cp['connection_info']['database'])
            print(' '*6, 'dataset:', cp['dataset'])
            print(' '*6, 'workspace_factory:', cp['workspace_factory'])
    
    aprx = arcpy.mp.ArcGISProject(r"C:\Projects\Mexico\MexicoEcology.aprx")
    m = aprx.listMaps()[0]
    for lyr in m.listLayers():
        print(f"LAYER: {lyr.name}")
        if lyr.supports("dataSource"):
            cp = lyr.connectionProperties
            if cp is not None:
                ConPropsWithJoins(cp)

Updating data sources via the CIM

Starting with ArcGIS Pro 2.4, Python developers have fine-grained access to the Cartographic Information Model (CIM) and can access many more settings, properties, and capabilities that are persisted in a project or document. This can be useful in updating data source workflows. For more information, see the following:

If a specific data source workflow is difficult to accomplish using the updateConnectionProperties function, modifying a layer's CIM structure is an option. The Python CIM Access topic describes the JSON structure of the CIM object model. Understanding this structure will allow you to update a layer's CIM.

For example, the following is a JSON representation of a CAD layer's data source. The JSON below is not the full CIM structure of the layer. Rather, it is a snippet showing only the dataConnection node.

"dataConnection" : {
  "type" : "CIMFeatureDatasetDataConnection",
  "featureDataset" : "parcels.dwg",
  "workspaceConnectionString" : "DATABASE=C:\\Projects\YosemiteNP\\CAD",
  "workspaceFactory" : "Cad",
  "dataset" : "Polyline",
  "datasetType" : "esriDTFeatureClass"
}

Below are some examples of using the CIM to update data sources:

  1. The following example references a CAD layer in a map. It will then update the layer to point to a new CAD file. The script assumes that the new CAD file is in the same folder as the previous CAD file.

    Nota:

    Updating the data source for CAD layers to point to a new CAD file requires modifying the CIM. However, just changing the folder that the CAD file resides in can be accomplished using the updateConnectionProperties function.

    import arcpy
    
    aprx = arcpy.mp.ArcGISProject(r"C:\Projects\YosemiteNP\Yosemite.aprx")
    m = aprx.listMaps('CAD')[0]
    # Select the CAD sub layer to update
    lyr = m.listLayers('Parcels')[0]
    
    # Access layer CIM
    lyrCIM = lyr.getDefinition("V2")
    dc = lyrCIM.featureTable.dataConnection
    
    # Update the feature dataset with the new CAD file name 
    dc.featureDataset = "NewParcels.dwg"
    
    # Update layer CIM
    lyr.setDefinition(lyrCIM)
    
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  2. This script changes the data source of a relate on a layer. The script changes the dataset and the file geodatabase of the relate.

    import arcpy
    
    # Specify the new relate properties
    newGDB = "FGDB2.gdb"
    newFeatureClass = "Cities2"
    newRelateName = "New Relate"
    
    # Reference project, map and layer 
    p = arcpy.mp.ArcGISProject(r'C:\Projects\USA.aprx')
    m = p.listMaps('Relate Map')[0]
    l = m.listLayers('States')[0]
    
    # Get the layer's CIM definition
    lyrCIM = l.getDefinition('V2')         
    
    # Get the first relate on the layer
    relate = lyrCIM.featureTable.relates[0]
    
    # Get the data connection properties for the relate
    dc = relate.dataConnection
    
    # Change the connection string to point to the new File Geodatabase
    dc.workspaceConnectionString = dc.workspaceConnectionString.replace("FGDB.gdb", newGDB)
    
    # Change the dataset name
    dc.dataset = newFeatureClass
        
    # Change the relate's name
    relate.name = newRelateName
    
    # Set the layer's CIM definition
    l.setDefinition(lyrCIM)
    
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  3. This script changes the data source of a layer where the feature dataset names are different between the existing and new geodatabase. Updating the feature dataset requires modifying the CIM.

    import arcpy
    
    # Specify the new geodatabase properties
    newGDB = "UpdatedParcels.gdb"
    newFeatureClass = "UpdatedParcelsFC"
    newFeatureDataSet = "UpdatedParcelsFDS"
    
    # Reference project, map and layer 
    p = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    m = p.listMaps('Parcels Map')[0]
    l = m.listLayers('Parcels')[0]
    
    # Get the layer's CIM definition
    lyrCIM = l.getDefinition('V2')         
    
    # Get the data connection properties for the layer
    dc = lyrCIM.featureTable.dataConnection
    
    # Change the connection string to point to the new File Geodatabase
    dc.workspaceConnectionString = dc.workspaceConnectionString.replace("Parcels.gdb", newGDB)
    
    # Change the dataset name
    dc.dataset = newFeatureClass
    
    # If the data is in a Feature Dataset, then update it 
    if hasattr(dc, "featureDataset"):
        dc.featureDataset = newFeatureDataSet
        
    # Set the layer's CIM definition
    l.setDefinition(lyrCIM)
    
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")
  4. In some workflows, the CIM structure of the new data source is different than the existing structure, requiring you to create a new CIM data connection object. In this example, a layer that did not reside in a feature dataset is being updated to reference a feature class that is in a feature dataset. This requires creating a new CIM data connection object, as the feature dataset attribute is not in the existing layer's CIM structure. Note that you do not have to explicitly set the feature dataset attribute in the code. You only have to specify the feature class, and the feature dataset will be populated automatically. The same code can be used to update the data source of a layer that resides in a feature dataset to a feature class that does not reside in a feature dataset.

    import arcpy
    
    # Specify the new geodatabase properties
    newGDB = r'C:\Projects\Data\NewParcels.gdb'
    newFeatureClass = "UpdatedParcelsFC"
    
    # Reference project, map and layer 
    p = arcpy.mp.ArcGISProject(r'C:\Projects\YosemiteNP\Yosemite.aprx')
    m = p.listMaps('Parcels Map')[0]
    l = m.listLayers('Parcels')[0]
    
    # Get the layer's CIM definition
    lyrCIM = l.getDefinition('V2')         
    
    # Create a new CIM data connection
    dc = arcpy.cim.CreateCIMObjectFromClassName('CIMStandardDataConnection', 'V2')
    
    # Specify the geodatabase
    dc.workspaceConnectionString = f"DATABASE={newGDB}"
    
    # Specify the workspace type
    dc.workspaceFactory = "FileGDB"
    
    # Specify the dataset name
    dc.dataset = newFeatureClass
    
    # Set the new data connection to the layer's CIM featureTable
    lyrCIM.featureTable.dataConnection = dc
        
    # Set the layer's CIM definition
    l.setDefinition(lyrCIM)
    
    aprx.saveACopy(r"C:\Projects\YosemiteNP\YosemiteNew.aprx")