Geoprocessing considerations for shapefile output

Esri has two main data formats for storing geographic information: shapefiles and geodatabases. Shapefiles were developed to provide a simple format for storing geographic and attribute information. Because of the simplicity of shapefiles, they are a popular open data transfer format. While shapefiles may seem to be an easy choice because of their simplicity, there are limitations in their use that geodatabases address. In broad terms, the limitations include the following:

  • Geographic data is more than the simple features and attributes that a shapefile can store. For example, there are annotation, attribute relationships, topology relationships, attribute domains and subtypes, coordinate precision and resolution, and numerous other capabilities that are supported in geodatabases but not in shapefiles.
  • Shapefiles make use of the dBASE file format (.dbf file) to store attributes. dBASE is a non-Esri format developed in the early 1980s and was, at that time, the most popular format for storing tables of attributes. However, time has passed it by, and there have been a number of data representation improvements since then, such as the Unicode standard, to support most of the world's writing systems. This is one reason why shapefiles do not work well for storing information in a language other than English.
  • Unlike feature classes in a geodatabase, shapefiles do not have maintained shape length and shape area fields.

Shapefiles are a poor choice for active database management—they do not handle the modern life cycle of data creation, editing, versioning, and archiving.

Shapefiles can be beneficial under the following circumstances:

  • When exporting data for use in a non-Esri software application.
  • To quickly write simple features and attributes.

With some exceptions that are noted below, shapefiles are acceptable for storing simple feature geometry. However, shapefiles are problematic for attributes. For example, shapefiles cannot store null values, they round up numbers, they have poor support for Unicode character strings, they do not allow field names longer than 10 characters, and they cannot store time in a date field. In addition, they do not support capabilities found in geodatabases, such as domains and subtypes. So unless your data will have very simple attributes and does not require geodatabase capabilities, do not use shapefiles.

Shapefile components and file extensions

Shapefiles are stored in three or more files that all have the same prefix and are stored in the same folder. The individual files are visible when viewing the folder in File Explorer, not in ArcGIS Pro.

ExtensionDescriptionRequired?

.shp

The main file that stores the feature geometry. No attributes are stored in this file—only geometry.

Yes

.shx

A companion file to the .shp file that stores the position of individual feature IDs in the .shp file.

Yes

.dbf

The dBASE table that stores the attribute information of features.

Yes

.sbn and .sbx

Files that store the spatial index of the features.

No

.atx

Created for each dBASE attribute index.

No

.ixs and .mxs

Geocoding index for read-write shapefiles.

No

.prj

The file that stores the coordinate system information.

No

.xml

Metadata for ArcGIS; it stores information about the shapefile.

No

Shapefile extensions

Shapefile limitations

Shapefiles have the following limitations:

  • There is a 2 GB size limit for any shapefile component file, which translates to a maximum of roughly 70 million point features. The number of line or polygon features in a shapefile depends on the number of vertices in each line or polygon (a vertex is equivalent to a point).
  • Shapefiles do not contain an x,y tolerance as do geodatabase feature classes. The x,y tolerance is the minimum distance between coordinates before they are considered equal. This x,y tolerance is used when evaluating relationships between features within the same feature class or between several feature classes. It is also used extensively when editing features. When using any operation that involves the comparison of features, such as tools in the Overlay toolset, the Clip tool, the Select Layer By Location tool, or any tool that takes two or more feature classes as input, use geodatabase feature classes (which have an x,y tolerance) rather than shapefiles.
  • A shapefile may take three to five times as much space as a file or enterprise geodatabase because of shape compression methods.
  • Shapefiles support multipatches but lack support for the following advanced multipatch capabilities:
    • Texture coordinates
    • Textures and part color
    • Lighting normals
  • The spatial index for a shapefile is inefficient compared to that of a geodatabase feature class. This means that spatial queries (such as selecting features within a polygon) take longer compared to a geodatabase feature class. This inefficiency is only noticeable when dealing with large numbers of features.
  • Parametrically defined curves (also known as circular arc curves) are not supported on shapefiles. Parametric curves are created by editing geodatabase feature classes, as described in Create circular arcs. Circular arc curves use a mathematical formula to draw the curve. When exporting a geodatabase feature class that contains circular arc curve features to a shapefile, the curved features are transformed to simple line features with closely spaced vertices to capture the curved shape.

Attribute limitations

Attribution in shapefiles have the following limitations:

  • Unlike other formats, shapefiles store numeric attributes in character format rather than binary format. For real numbers (that is, numbers containing decimal places), this may lead to rounding errors. This limitation does not apply to shape coordinates, only attributes. The following table summarizes the field width for each attribute data type.

    Geodatabase data typedBASE field typedBASE field width (number of characters)

    Object ID

    Number

    9

    Short Integer

    Number

    4

    Long Integer

    Number

    9

    Float

    Float

    13

    Double

    Float

    13

    Text

    Character

    254

    Date

    Date

    8

    Field widths in a dBASE
  • Date fields only support date; they do not support time.
    Caution:

    The lack of support for time in date fields is a serious limitation for any tool that performs temporal analysis.

  • Field names cannot be longer than 10 characters.
  • The maximum record length for an attribute is 4,000 bytes. The record length is the number of bytes used to define all the fields, not the number of bytes used to store the actual values.
  • The maximum number of fields is 255. A conversion to shapefile will convert the first 255 fields if this limit is exceeded.
  • The dBASE file must contain at least one field. When creating a shapefile or dBASE table, an integer ID field is included by default.
  • dBASE files do not support BLOB, GUID, Global ID, or Raster field types.
  • dBASE files have little SQL support aside from a Where clause.
  • Attribute indexes are deleted when saving edits, and must be re-created from scratch.

By default, in ArcGIS Pro, dBASE files support the ANSI character set for their field names and values. Esri includes Unicode support for dBASE files that support Unicode field names and values. However, this additional support may not be available in non-Esri applications. To support other character sets, set the dbfDefault registry value to a support code page identifier.

The length property of a string field varies by data source. In a dBASE file, the field length represents the number of bytes that are supported. In contrast, in a file geodatabase table, the field length represents the number of characters. This means, that a shapefile with a field length of 10 can only hold three 3 byte Chinese characters. When exporting a table with multibyte characters to a dBASE table, the values may not transfer as expected. If the length of the multibyte characters are longer than the length of the field, field values will contain empty or unexpected values.

Null value representation

Null values are not supported in shapefiles. If a feature class containing nulls is converted to a shapefile, or a database table is converted to a dBASE file, the null values will be changed, as described in the table below.

Caution:

When using shapefiles or dBASE (.dbf) files as inputs to tools, ArcGIS cannot determine whether a field value represents a null value or a legitimate value.

Data type containing a null valueNull value substitution

Number (Double type)—When the tool requires that a NULL, infinity, or NaN (Not a Number) be output

-1.7976931348623158e+308 (IEEE standard for the maximum negative value)

Number (Long type)—When the tool requires that a NULL, infinity, or NaN (Not a Number) be output

-214748364

Number (Double and Long types)—All other geoprocessing tools

0

Text

" " (blank—single space)

Date

Stored as zero, but displays <null>

Null value substitution in shapefiles and dBASE (.dbf) files

Unsupported capabilities

Shapefiles have no extended data types at either the workspace or feature class level. Any conversion to shapefile from a geodatabase feature class or other format will result in the loss of the following:

  • Subtypes
  • Attribute domains
  • Geometric networks
  • Topologies
  • Annotation

Shape length and shape area

For line or polygon feature classes stored in a geodatabase, ArcGIS calculates and maintains a shape_length and a shape_area field. When editing the geometry of a line or polygon in a geodatabase feature class, the values in shape_length and shape_area fields are automatically updated. This is not true for shapefiles. Even if a shapefile has a shape_area or shape_leng field, the field will not be updated if edits are made to the shapefile.

Shapefiles and geoprocessing

Most geoprocessing tools that output a feature class support both shapefiles and geodatabase feature classes as the output format. Similarly, most tools that output a table support both dBASE files (.dbf) and geodatabase tables as the output.

When using a tool in the Geoprocessing pane, the tool will automatically generate an output feature class or table path. If the Current Workspace environment is set to folder, and not a geodatabase, the output feature class path will be a shapefile or dBASE file. By default, the Current Workspace environment is set to a geodatabase.

Learn more about geoprocessing environments

Because shapefiles write quickly, they are sometimes used to write intermediate data in models, since this makes for a faster model run. However, writing to a file geodatabase is almost as fast as writing to a shapefile, so unless run speed is critical, file geodatabases are preferable for intermediate and output data. An alternative to using shapefiles for intermediate data is to write features to the memory workspace.

Spatial reference and shapefiles

Spatial reference and geoprocessing discusses the importance of spatial reference properties when using geoprocessing tools. There are a number of geoprocessing environments that control the spatial reference used by tools. The following environments are not honored when the output of a tool is a shapefile:

Related topics