Geoprocessing considerations for shapefile output

Over the years, Esri has developed three main data formats for storing geographic information: coverages, shapefiles, and geodatabases. Shapefiles were developed to provide a simple format for storing geographic and attribute information. Because of the simplicity of shapefiles, they are a very popular open data transfer format. While shapefiles may seem to be an easy choice because of their simplicity, there are limitations in their use that geodatabases address. When using shapefiles, you should be aware of their limitations. In broad general terms, the limitations include the following:

  • Geographic data is more than the simple features and attributes that a shapefile can store. For example, there are annotation, attribute relationships, topology relationships, attribute domains and subtypes, coordinate precision and resolution, and numerous other capabilities that are supported in geodatabases but not in shapefiles.
  • Shapefiles make use of the dBASE file format (.dbf file) to store attributes. dBASE is a non-Esri format developed in the early 1980s and was, at that time, the most popular format for storing tables of attributes. However, time has passed them by, and there have been a number of data representation improvements since then, such as the Unicode standard, to support most of the world's writing systems. This is one reason why shapefiles do not work well for storing information in a language other than English.
  • Unlike feature classes in a geodatabase, ArcGIS does not calculate and maintain shape length and shape area fields.

These issues (and more) mean that shapefiles are an extremely poor choice for active database management—they do not handle the modern life cycle of data creation, editing, versioning, and archiving.

When should I use a shapefile?

  • When exporting data for use in a non-Esri software application.
  • When you need to write simple features and attributes quickly. (However, you must be aware of the limitations as detailed below.)

When should I not use a shapefile?

With some exceptions that are noted below, shapefiles are acceptable for storing simple feature geometry. However, shapefiles have serious problems with attributes. For example, they cannot store null values, they round up numbers, they have poor support for Unicode character strings, they do not allow field names longer than 10 characters, and they cannot store time in a date field. These are just the main issues. Additionally, they do not support capabilities found in geodatabases, such as domains and subtypes. So unless you have very simple attributes and require no geodatabase capabilities, do not use shapefiles.

Shapefile components and file extensions

Shapefiles are stored in three or more files that all have the same prefix and are stored in the same system folder (shapefile workspace). You will see the individual files when viewing the folder in Windows Explorer, not in ArcGIS Pro.

ExtensionDescriptionRequired?

.shp

The main file that stores the feature geometry. No attributes are stored in this file—only geometry.

Yes

.shx

A companion file to the .shp that stores the position of individual feature IDs in the .shp file.

Yes

.dbf

The dBASE table that stores the attribute information of features.

Yes

.sbn and .sbx

Files that store the spatial index of the features.

No

.atx

Created for each dBASE attribute index.

No

.ixs and .mxs

Geocoding index for read-write shapefiles.

No

.prj

The file that stores the coordinate system information.

No

.xml

Metadata for ArcGIS; stores information about the shapefile.

No

Shapefile extensions

Geometry limitations

  • There is a 2 GB size limit for any shapefile component file, which translates to a maximum of roughly 70 million point features. The actual number of line or polygon features you can store in a shapefile depends on the number of vertices in each line or polygon (a vertex is equivalent to a point).
  • Shapefiles do not contain an x,y tolerance as do geodatabase feature classes. The x,y tolerance is the minimum distance between coordinates before they are considered equal. This x,y tolerance is used when evaluating relationships between features within the same feature class or between several feature classes. It is also used extensively when editing features. If you are performing any sort of operation involving comparison between features, such as use of the Overlay toolset, the Clip tool, the Select Layer By Location tool, or any tool that takes two or more feature classes as input, you should be using geodatabase feature classes (which have an x,y tolerance) rather than shapefiles.
  • A shapefile may take three to five times as much space as a file geodatabase or SDE because of shape compression methods.
  • Shapefiles support multipatches but lack support for the following advanced multipatch capabilities:
    • Texture coordinates
    • Textures and part color
    • Lighting normals
  • The spatial index for a shapefile is inefficient compared to that of a geodatabase feature class. This means that spatial queries (such as selecting features within a polygon) take longer compared to a geodatabase feature class. This inefficiency is only noticeable when dealing with large numbers of features.
  • Parametrically defined curves (also known as circular arc curves) are not supported on shapefiles. Parametric curves are created by editing geodatabase feature classes, as described in Create circular arcs. Circular arc curves use a mathematical formula to draw the curve. If you export a geodatabase feature class containing circular arc curve features to a shapefile, the curved features are transformed to simple line features with closely spaced vertices to capture the curved shape.

Attribute limitations

  • Unlike other formats, shapefiles store numeric attributes in character format rather than binary format. For real numbers (that is, numbers containing decimal places), this may lead to rounding errors. This limitation does not apply to shape coordinates, only attributes. The following table summarizes the field width for each attribute data type.

    Geodatabase data typedBASE field typedBASE field width (number of characters)

    Object ID

    Number

    9

    Short Integer

    Number

    4

    Long Integer

    Number

    9

    Float

    Float

    13

    Double

    Float

    13

    Text

    Character

    254

    Date

    Date

    8

    Field widths in a dBASE
  • The dBASE file standard only supports ANSI characters in their field names and values. Esri has added extensive Unicode support for dBASE files to allow you to store Unicode field names and values. But this additional support resides only in ArcGIS and may not be available in non-Esri applications.
    Note:

    If you have to support Unicode in your field names or field values, it is strongly recommended that you use geodatabases rather than shapefiles.

  • Date fields only support date. They do not support time.
    Caution:

    The nonsupport of time in date fields can be a serious limitation for any tool that performs temporal analysis, such as those found in the Space Time Pattern Mining toolbox. Avoid using shapefiles for any kind of temporal analysis or date time calculation.

  • Field names cannot be longer than 10 characters.
  • The maximum record length for an attribute is 4,000 bytes. The record length is the number of bytes used to define all the fields, not the number of bytes used to store the actual values.
  • The maximum number of fields is 255. A conversion to shapefile will convert the first 255 fields if this limit is exceeded.
  • The dBASE file must contain at least one field. When you create a shapefile or dBASE table, an integer ID field is created as a default.
  • dBASE files do not support type blob, guid, global ID, coordinate ID, or raster field types.
  • dBASE files have little SQL support aside from a WHERE clause.
  • Attribute indexes are deleted when you save edits, and you must re-create them from scratch.

Null value representation

Null values are not supported in shapefiles. If a feature class containing nulls is converted to a shapefile, or a database table is converted to a dBASE file, the null values will be changed as described in the table below.

Caution:

When using shapefiles or dBASE (.dbf) files as inputs to tools, ArcGIS cannot determine whether a field value represents a null value or a legitimate value.

Data type containing null valueNull value substitution

Number—When tool requires that a NULL, infinity, or NaN (Not a Number) be output

-1.7976931348623158e+308 (IEEE standard for the maximum negative value)

Number (all other geoprocessing tools)

0

Text

" " (blank—single space)

Date

Stored as zero, but displays <null>

Null value substitution in shapefiles and dBASE (.dbf) files

Unsupported capabilities

Shapefiles have no extended data types at either the workspace or feature class level. Any conversion to shapefile from a geodatabase feature class or other format will result in the loss of the following:

  • Subtypes
  • Attribute domains
  • Geometric networks
  • Topologies
  • Annotation

Shape length and shape area

For line or polygon feature classes stored in a geodatabase, ArcGIS calculates and maintains a shape_length and a shape_area field; that is, when you edit the shape of a line or polygon in a geodatabase feature class, the values in shape_length and shape_area fields are recalculated to reflect edits made to the features. This is not the case for shapefiles. Even if your shapefile has a shape_area or shape_leng field, it will not be updated if edits are made to the shapefile.

Shapefiles and geoprocessing

Any geoprocessing tool that outputs a feature class allows you to choose either a shapefile or geodatabase feature class as the output format. Similarly, a tool that outputs a table allows you to choose either a dBASE file (.dbf) or a geodatabase table as the output. You should always be aware of which format you use and the consequences of converting a geodatabase input to a shapefile output.

Geoprocessing tools autogenerate an output feature class or table for you. If your scratch workspace environment is set to a system folder, and not a geodatabase, the autogenerated output feature class will be a shapefile or dBASE file.

It is suggested that you set your scratch workspace to a file geodatabase so that the autogenerated output is written to a file geodatabase, not a shapefile or .dbf table.

Learn more about geoprocessing environments

Because shapefiles write quickly, they are often used to write intermediate data in models, since this makes for faster model execution. However, writing to a file geodatabase is almost as fast as writing to a shapefile, so unless execution speed is critical, you should always use a file geodatabase for intermediate and output data. If you do use shapefiles, be aware of their limitations as described above and only use shapefiles for simple features and attributes. An alternative to using shapefiles for intermediate data is to write features to the in_memory workspace.

Learn more about writing geoprocessing output to memory

Spatial reference and shapefiles

Spatial reference and geoprocessing discusses the importance of spatial reference properties when using geoprocessing tools. There are a number of geoprocessing environments that control the spatial reference used by tools. The following environments are not honored when the output of a tool is a shapefile:

Related topics