Structured Query Language (SQL) is a standard computer language that contains a set of defined syntax and expressions used for accessing and managing data in databases and in other data processing technologies.
The American National Standards Institute (ANSI) defines a standard for SQL. Most RDBMSs use that standard and have extended it, making SQL syntax across different RDBMSs slightly different from one another.
Query expressions in ArcGIS adhere to standard SQL expressions. The SQL syntax you use within an expression differs depending on the data source. Each data source has its own variant of SQL, which are referred to as SQL dialects, such as the following:
- File-based data, including file geodatabases, shapefiles, in-memory table views, text files such as .dbf, .csv, .txt, .xlsx tables, and feature services that use standardized queries use the ArcGIS SQL dialect that supports a subset of SQL capabilities.
- Mobile geodatabases, ST_geometry SQLite, GeoPackage, and Excel use SQLite SQL dialect.
- Databases or enterprise geodatabases use the SQL syntax of the underlying RDBMS, such as, Oracle, SQL Server, PostgreSQL, SAP HANA, and IBM Db2, in which each database uses its own slightly different SQL dialect.
When using ArcGIS dialog boxes to construct a SQL expression, autocomplete helps you apply the correct syntax for the data source you're querying. As you type, a prompt appears, showing the field names, values, keywords, and operators supported by your data source.
Tip:
Review the following to help determine when ArcGIS SQL syntax is used or when the SQL syntax of the underlying RDBMS is used when creating an SQL expression.
- If the data within your SQL expression comes from a mixture of data source locations, the following will occur:
- When the data sources come from both file-based data and from an RDBMS, ArcGIS SQL syntax will be used.
- If all the data within your SQL expression comes from the same data source location, the following will occur:
- When the data source is file-based data, ArcGIS SQL syntax will be used.
- When the data source is a database or enterprise geodatabase, ArcGIS will pass the SQL expression to the RDBMS for resolution and you will need to consult the documentation for your database management system for the specific expression syntax and data types supported.
Within ArcGIS Pro, the SQL expression dialog box can be found in the following locations:
- Select by attributes using the Select Layer by Attribute geoprocessing tool.
- Definition Query tab on the Layer Properties dialog box.
- Display filters tab in the Symbology pane.
- Create reports using the Create New Reports pane.
- Export tables using the Export Table geoprocessing tool.
- Export features using the Export Features geoprocessing tool.
- Use the Calculate Field geoprocessing tool to create an expression to perform simple or complex calculations on field values.
- Use Select to query data for further analysis.
- Use the Make Query Table geoprocessing tool to create a layer or table view.
- Use the Make Feature Layer geoprocessing tool to create a feature layer.
- Create a view in a database or geodatabase using the Create Database View geoprocessing tool.
- Use the Append geoprocessing tool to append multiple input datasets into a target dataset.
- Use ProSDK Core.Data.QueryDef.
SQL expression syntax
A SQL expression contains a combination of one or more values, operators, and SQL functions that can be used to query or select a subset of features and table records within ArcGIS.
All SQL queries are expressed using the keyword SELECT.
SELECT * FROM forms the first part of the SQL expression and is automatically supplied for you on most ArcGIS dialog boxes. For example, when you construct a query by writing SQL syntax, a SELECT statement is used to select fields from a layer or table and is supplied for you.
The next part of the SQL expression that comes after SELECT * FROM <Layer_name> is the WHERE clause. The WHERE clause is used to get records that meet specific criteria and is the part of the expression you must build.
Tip:
The asterisk (*) in a SQL expression is used to ask for all columns.
Here is a basic form of a SQL expression WHERE clause:
- <Field_name> <Operator> <Value or String>
For example, STATE_NAME = 'Florida'. This expression contains a single clause and selects all features containing 'Florida' in the STATE_NAME field.
For compound expressions, the following form is used:
- <Field_name> <Operator> <Value or String> <Connector> <Field_name> <Operator> <Value or String> ...
For example, STATE_NAME = 'Florida' OR (STATE_NAME = 'South Carolina' AND POP2010 > 15000). This compound expression is comprised of multiple clauses connected by a logical operator, AND or OR, and selects all features containing Florida in the STATE_NAME field, and all the features that contain both South Carolina in the STATE_NAME field and have a value greater than 15,000 in the field named POP2010.
Tip:
Optionally, parentheses () can be used for defining the order of operations in compound expressions.
Because you are selecting columns as a whole, you cannot restrict the SELECT statement to return only some of the columns in the corresponding table because the SELECT * syntax is hard-coded. For this reason, keywords, such as DISTINCT, ORDER BY, and GROUP BY, cannot be used in an SQL expression in ArcGIS except when using subqueries. To learn more, see the Subqueries section below.
The following sections describe the elements of common SQL query expressions used in ArcGIS.
Common queries: Searching strings
Strings must always be enclosed in single quotation marks in queries, for example:
STATE_NAME = 'California'
Strings are case sensitive in expressions, except when run on geodatabases in Microsoft SQL Server. To make a case-insensitive search in other data sources, you can use an SQL function to convert all values to the same case. For file-based data sources such as file geodatabases or shapefiles, you can use the UPPER or LOWER function to set the case for a selection. For example, the following expression selects the state whose name is stored as 'Rhode Island' or 'RHODE ISLAND':
UPPER(STATE_NAME) = 'RHODE ISLAND'
If the string contains a single quotation mark, you first need to use another single quotation mark as an escape character, for example:
NAME = 'Alfie''s Trough'
Use the LIKE operator (instead of the = operator) to build a partial string search. For example, this expression selects Mississippi and Missouri among United States state names:
STATE_NAME LIKE 'Miss%'
The percent symbol (%) means that anything is acceptable in its place: one character, a hundred characters, or no character. Alternatively, to search with a wildcard that represents one character, use an underscore (_). For example, this expression finds Catherine Smith and Katherine Smith:
OWNER_NAME LIKE '_atherine Smith'
You can use greater than (>), less than (<), greater than or equal (>=), less than or equal (<=), not equal (<>), and BETWEEN operators to select string values based on sorting order. For example, this expression selects all the cities in a coverage with names starting with the letters M through Z:
CITY_NAME >= 'M'
String functions can be used to format strings. For instance, the LEFT function returns a certain number of characters starting on the left of the string. In this example, the query returns all states starting with the letter A:
LEFT(STATE_NAME,1) = 'A'
Refer to the documentation of your database management system (DBMS) for a list of supported functions.
Common expressions: Searching for NULL values
You can use the NULL keyword to select features and records that have null values for the specified field. The NULL keyword is always preceded by IS or IS NOT. For example, to find cities whose 1996 population has not been entered, you can use the following:
POPULATION IS NULL
Alternatively, to find cities whose 1996 population has been entered, you can use the following:
POPULATION96 IS NOT NULL
Common expressions: Searching numbers
The decimal point (.) is always used as the decimal delimiter, regardless of your locale or regional settings. The comma cannot be used as a decimal or thousands delimiter in an expression.
You can query numbers using the equal (=), not equal (<>), greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), and BETWEEN operators, for example:
POPULATION >= 5000
Numeric functions can be used to format numbers. For instance, the ROUND function rounds a number to a given number of decimals in a file geodatabase:
ROUND(SQKM,0) = 500
Refer to your DBMS documentation for a list of supported numeric functions.
Dates and time
General rules and common expressions
Geodatabase data sources store dates in a date-time field. However, shapefiles do not. Therefore, most of the query syntax listed below contains a reference to the time. In some cases, the time part of the query may be safely omitted if the field is known to contain only dates; in other cases, it needs to be stated, or the query will return a syntax error.
Searching date fields requires careful attention to the syntax required by your data source. If you build a date query in Clause mode of the Query Builder, the correct syntax will be automatically generated for you. Here is an example of a query that will return all records on or after January 1, 2011, for a file geodatabase data source:
INCIDENT_DATE >= date '2011-01-01 00:00:00'
Note:
Dates are stored in the underlying database as a reference to December 30, 1899, at 00:00:00. This is valid for all the data sources listed here.
The purpose of this section is only to help you query dates, not time values. When a time that is not null is stored with the dates (for instance, January 12, 1999, 04:00:00), querying the date only will not return the record because when you pass only a date to a date-time field, it will fill the time with zeros and retrieve only the records in which the time is 12:00:00 a.m.
The attribute table shows date and time in a user-friendly format, depending on your regional settings, rather than the underlying database's format. This is fine most of the time, but it also has a few drawbacks:
- The string shown in the SQL query may only slightly resemble the value shown in the table, especially when time is involved. For instance, a time entered as 00:00:15 shows as 12:00:15 a.m. in the attribute table, with the United States as your regional settings, and the comparable query syntax is Datefield = '1899-12-30 00:00:15'.
- The attribute table does not know what the underlying data source is until you save your edits. It will first try to format the value entered to fit its own format, and upon saving edits, it will try to tweak the resulting value to fit into the database. Because of this, you can enter a time in a shapefile, but you will find that it is dropped when you save your edits. The field will then contain a value '1899-12-30' that will show as 12:00:00 a.m. or the equivalent depending on your regional settings.
Date-time syntax for enterprise geodatabases
Oracle
Datefield = date 'yyyy-mm-dd'
Keep in mind this will not return records in which the time is not null.
An alternative format for querying dates in Oracle follows:
Datefield = TO_DATE('yyyy-mm-dd hh:mm:ss','YYYY-MM-DD HH24:MI:SS')
The second parameter 'YYYY-MM-DD HH24:MI:SS' describes the format used for querying. An actual query looks like this:
Datefield = TO_DATE('2003-01-08 14:35:00','YYYY-MM-DD HH24:MI:SS')
You can use a shorter version:
TO_DATE('2003-11-18','YYYY-MM-DD')
Again, this will not return records in which the time is not null.
SQL Server
Datefield = 'yyyy-mm-dd hh:mm:ss'
The hh:mm:ss part of the query can be omitted when the time is not set in the records.
The following is an alternative format:
Datefield = 'mm/dd/yyyy'
IBM Db2
Datefield = TO_DATE('yyyy-mm-dd hh:mm:ss','YYYY-MM-DD HH24:MI:SS')
The hh:mm:ss part of the query cannot be omitted even if the time is equal to 00:00:00.
PostgreSQL
Datefield = TIMESTAMP 'YYYY-MM-DD HH24:MI:SS' Datefield = TIMESTAMP 'YYYY-MM-DD'
You must specify the full time stamp when using equal-to queries or no records will be returned. You can successfully query with the following statements if the table you query contains date records with these exact time stamps (2007-05-29 00:00:00 or 2007-05-29 12:14:25):
select * from table where date = '2007-05-29 00:00:00';
or
select * from table where date = '2007-05-29 12:14:25';
If you use other operators—such as greater than, less than, greater than or equal to, or less than or equal to—you don't need to designate the time, but you can if you want to be that precise. Both of the following statements work:
select * from table where date < '2007-05-29';
select * from table where date < '2007-05-29 12:14:25';
File geodatabases, shapefiles, coverages, and other file-based data sources
Datefield = date 'yyyy-mm-dd'
File geodatabases support the use of a time in the date field, so this can be added to the expression:
Datefield = date 'yyyy-mm-dd hh:mm:ss'
Shapefiles and coverages do not support the use of time in a date field.
Note:
All SQL used by the file geodatabase is based on the SQL-92 standard.
Known limitations
Querying a date on the left part (first table) of a join only works with file-based data sources, such as file geodatabases, shapefiles, and DBF tables. However, there is a possible workaround for working with data that is not file-based, such as enterprise data as described below.
Querying a date on the left part of a join will be successful when using the limited version of SQL developed for file-based data sources. If you are not using such a data source, you can force the expression to use this format. This can be done by ensuring the query expression involves fields from more than one join table. For example, if a feature class and a table (FC1 and Table1) are joined and are both from an enterprise geodatabase, the following expressions will fail or return no data:
FC1.date = date #01/12/2001# FC1.date = date '01/12/2001'
To query successfully, you can create a query as follows:
FC1.date = date '01/12/2001' and Table1.OBJECTID > 0
Since the query involves fields from both tables, the limited SQL version will be used. In this expression, Table1.OBJECTID is always > 0 for records that matched during join creation, so this expression is true for all rows that contain join matches.
To ensure that every record with FC1.date = date '01/12/2001' is selected, use the following query:
FC1.date = date '01/12/2001' and (Table1.OBJECTID IS NOT NULL OR Table1.OBJECTID IS NULL)
This query will select all records with FC1.date = date '01/12/2001', whether or not there was a join match for each particular record.
Combining expressions
Compound expressions can be built by combining expressions with the AND and OR operators. For example, the following expression selects all the houses that have more than 1,500 square feet and a garage for three or more cars:
AREA > 1500 AND GARAGE > 3
When you use the OR operator, at least one side of the expression of the two separated by the OR operator must be true for the record to be selected, for example:
RAINFALL < 20 OR SLOPE > 35
Use the NOT operator at the beginning of an expression to find features or records that don't match the specified expression, for example:
NOT STATE_NAME = 'Colorado'
NOT expressions can be combined with AND and OR. For example, this expression selects all the New England states except Maine:
SUB_REGION = 'New England' AND NOT STATE_NAME = 'Maine'
Calculations
Calculations can be included in expressions using the arithmetic operators +, -, *, and /. Calculations can be between fields and numbers, for example:
AREA >= PERIMETER * 100
Calculations can also be performed between fields. For example, to find the countries with a population density of less than or equal to 25 people per square mile, you can use this expression:
POP1990 / AREA <= 25
Operator precedence
Expressions are evaluated according to standard operator precedence rules. For example, the part of an expression enclosed in parentheses is evaluated before the part that isn't enclosed.
HOUSEHOLDS > MALES * (POP90_SQMI + AREA)
You can add parentheses in SQL Edit mode by typing them, or use the Group and Ungroup commands in Clause mode to add or remove them.
Subqueries
A subquery is a query nested in another query and is supported by geodatabase data sources only. It can be used to apply predicate or aggregate functions or to compare data with values stored in another table. This can be done with the IN or ANY keyword. For example, this query selects only the countries that are not also listed in the indep_countries table:
COUNTRY_NAME NOT IN (SELECT COUNTRY_NAME FROM indep_countries)
Note:
Shapefiles and other nongeodatabase file-based data sources do not support subqueries. Subqueries that are performed on versioned enterprise feature classes and tables will not return features that are stored in the delta tables. File geodatabases provide the limited support for subqueries explained in this section, while enterprise geodatabases provide full support. For information on the full set of subquery capabilities of enterprise geodatabases, refer to your DBMS documentation.
This query returns the features with a GDP2006 greater than the GDP2005 of any of the features contained in countries:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
Subquery support in file geodatabases is limited to the following:
- Scalar subqueries with comparison operators. A scalar subquery returns a single value, for example:
GDP2006 > (SELECT MAX(GDP2005) FROM countries)
For file geodatabases, the set functions AVG, COUNT, MIN, MAX, and SUM can only be used in scalar subqueries. - EXISTS predicate, for example:
EXISTS (SELECT * FROM indep_countries WHERE COUNTRY_NAME = 'Mexico')
Operators
The following is the full list of query operators supported by file geodatabases, shapefiles, coverages, and other file-based data sources. They are also supported by enterprise geodatabases, although these data sources may require different syntax. In addition to the operators below, enterprise geodatabases support other capabilities. See your DBMS documentation for details.
Arithmetic operators
You use an arithmetic operator to add, subtract, multiply, and divide numeric values.
Operator | Description |
---|---|
* | Arithmetic operator for multiplication |
/ | Arithmetic operator for division |
+ | Arithmetic operator for addition |
- | Arithmetic operator for subtraction |
Comparison operators
You use comparison operators to compare one expression to another.
Operator | Description |
---|---|
< | Less than. It can be used with strings (comparison is based on alphabetical order), numbers, and dates. |
<= | Less than or equal to. It can be used with strings (comparison is based on alphabetical order), numbers, and dates. |
<> | Not equal to. It can be used with strings (comparison is based on alphabetical order), numbers, and dates. |
> | Greater than. It can be used with strings (comparison is based on alphabetical order), numbers, and dates. |
>= | Greater than or equal to. It can be used with strings (comparison is based on alphabetical order), numbers, and dates. |
[NOT] BETWEEN x AND y | Selects a record if it has a value greater than or equal to x and less than or equal to y. When preceded by NOT, it selects a record if it has a value outside the specified range. For example, this expression selects all records with a value greater than or equal to 1 and less than or equal to 10: OBJECTID BETWEEN 1 AND 10 This is the equivalent of the following expression: OBJECTID >= 1 AND OBJECTID <= 10 However, the expression with BETWEEN provides better performance if you're querying an indexed field. |
[NOT] EXISTS | Returns TRUE if the subquery returns at least one record; otherwise, it returns FALSE. For example, this expression returns TRUE if the OBJECTID field contains a value of 50: EXISTS (SELECT * FROM parcels WHERE OBJECTID = 50) EXISTS is supported in file and enterprise geodatabases only. |
[NOT] IN | Selects a record if it has one of several strings or values in a field. When preceded by NOT, it selects a record if it doesn't have one of several strings or values in a field. For example, this expression searches for four state names: STATE_NAME IN ('Alabama', 'Alaska', 'California', 'Florida') |
IS [NOT] NULL | Selects a record if it has a null value for the specified field. When NULL is preceded by NOT, it selects a record if it has any value for the specified field. |
x [NOT] LIKE y [ESCAPE 'escape-character'] | Use the LIKE operator (instead of the = operator) with wildcards to build a partial string search. The percent symbol (%) means that anything is acceptable in its place: one character, a hundred characters, or no character. Alternatively, to search with a wildcard that represents one character, use an underscore (_). If you need to access noncharacter data, use the CAST function. For example, this query returns numbers that begin with 8 from the integer field SCORE_INT: CAST (SCORE_INT AS VARCHAR(10)) LIKE '8%' To include the percent symbol or underscore in your search string, use the ESCAPE keyword to designate another character as the escape character, which in turn indicates that a real percent sign or underscore immediately follows. For example, this expression returns any string containing 10%, such as 10% DISCOUNT or A10%: AMOUNT LIKE '%10$%%' ESCAPE '$' |
Logical operators
Operator | Description |
---|---|
AND | Combines two conditions and selects a record if both conditions are true. For example, the following expression selects any house with more than 1,500 square feet and a garage for more than two cars: AREA > 1500 AND GARAGE > 2 |
OR | Combines two conditions and selects a record if at least one condition is true. For example, the following expression selects any house with more than 1,500 square feet or a garage for more than two cars: AREA > 1500 OR GARAGE > 2 |
NOT | Selects a record if it doesn't match the expression. For example, the following expression selects all states but California: NOT STATE_NAME = 'California' |
String operators
Operator | Description |
---|---|
|| | Returns a character string that is the result of concatenating two or more string expressions. FIRST_NAME || MIDDLE_NAME || LAST_NAME |
Functions
The following is the full list of functions supported by file geodatabases, shapefiles, coverages, and other file-based data sources. The functions are also supported by enterprise geodatabases, although these data sources may require different syntax or function names. In addition to the functions below, enterprise geodatabases support other capabilities. See your DBMS documentation for details.
Date functions
Function | Description |
---|---|
CURRENT_DATE | Returns the current date. |
EXTRACT(extract_field FROM extract_source) | Returns the extract_field portion of the extract_source. The extract_source argument is a date-time expression. The extract_field argument can be one of the following keywords: YEAR, MONTH, DAY, HOUR, MINUTE, or SECOND. |
CURRENT TIME | Returns the current time. |
String functions
Arguments denoted as string_exp can be the name of a column, a character string literal, or the result of another scalar function in which the underlying data type can be represented as a character type.
Arguments denoted as character_exp are variable-length character strings.
Arguments denoted asstart or length can be a numeric literal or the result of another scalar function in which the underlying data type can be represented as a numeric type.
These string functions are 1 based; that is, the first character in the string is character 1.
Function | Description |
---|---|
CHAR_LENGTH(string_exp) | Returns the length in characters of the string expression. |
LOWER(string_exp) | Returns a string equal to that in string_exp, with all uppercase characters converted to lowercase. |
POSITION(character_exp IN character_exp) | Returns the position of the first character expression in the second character expression. The result is an exact numeric with an implementation-defined precision and a scale of zero. |
SUBSTRING(string_exp FROM start FOR length) | Returns a character string that is derived from string_exp, beginning at the character position specified by start for length characters. |
TRIM(BOTH | LEADING | TRAILING trim_character FROM string_exp) | Returns the string_exp with the trim_character removed from the leading, trailing, or both ends of the string. |
UPPER(string_exp) | Returns a string equal to that in string_exp, with all lowercase characters converted to uppercase. |
Numeric functions
All numeric functions return a numeric value.
Arguments denoted as numeric_exp, float_exp, or integer_exp can be the name of a column, the result of another scalar function, or a numeric literal, in which the underlying data type could be represented as a numeric type.
Function | Description |
---|---|
ABS(numeric_exp) | Returns the absolute value of numeric_exp. |
ACOS(float_exp) | Returns the arccosine of float_exp as an angle, expressed in radians. |
ASIN(float_exp) | Returns the arcsine of float_exp as an angle, expressed in radians. |
ATAN(float_exp) | Returns the arctangent of float_exp as an angle, expressed in radians. |
CEILING(numeric_exp) | Returns the smallest integer greater than or equal to numeric_exp. |
COS(float_exp) | Returns the cosine of float_exp in which float_exp is an angle expressed in radians. |
FLOOR(numeric_exp) | Returns the largest integer less than or equal to numeric_exp. |
LOG(float_exp) | Returns the natural logarithm of float_exp. |
LOG10(float_exp) | Returns the base 10 logarithm of float_exp. |
MOD(integer_exp1, integer_exp2) | Returns the remainder of integer_exp1 divided by integer_exp2. |
POWER(numeric_exp, integer_exp) | Returns the value of numeric_exp to the power of integer_exp. |
ROUND(numeric_exp, integer_exp) | Returns numeric_exp rounded to integer_exp places to the right of the decimal point. If integer_exp is negative, numeric_exp is rounded to |integer_exp| places to the left of the decimal point. |
SIGN(numeric_exp) | Returns an indicator of the sign of numeric_exp. If numeric_exp is less than zero, -1 is returned. If numeric_exp equals zero, 0 is returned. If numeric_exp is greater than zero, 1 is returned. |
SIN(float_exp) | Returns the sine of float_exp in which float_exp is an angle expressed in radians. |
TAN(float_exp) | Returns the tangent of float_exp in which float_exp is an angle expressed in radians. |
TRUNCATE(numeric_exp, integer_exp) | Returns numeric_exp truncated to integer_exp places to the right of the decimal point. If integer_exp is negative, numeric_exp is truncated to |integer_exp| places to the left of the decimal point. |
CAST function
The CAST() function converts a value or an expression from one data type to another specified data type. The syntax is as follows:
CAST (expression AS data_type(length))
- Where expression is a required parameter and can be a literal value or a valid expression of any type (for example, column name, variable) that will be converted.
- Where data_type is a required parameter and the keyword used is the resulting data type to which the expression will be cast. See the table below for a list of keywords to use for valid data types.
- Where length is an optional parameter and specifies the length of the resulting data type.
For example, in some scenarios, a string operation may be necessary, but if the data is stored in a number type field, the query wouldn't work. However, using the CAST() function, you can cast the number field to a string for a SQL operation. This code casts the number field SQLNUM as a text field, which can then be used in a text operation.
CAST(SQLNUM AS CHARACTER(12))
The following table contains the keywords to use for data type conversions and can be specified in uppercase or lowercase.
Data type | Keyword |
---|---|
Long Integer |
|
Short Integer |
|
Float (single-precision floating point) |
|
Double (double-precision floating point) |
|
String |
|
Datetime |
|
Note:
|
CAST function examples
- Example 1:
CAST(AREA AS INTEGER)
Casting AREA, which is a Float data type, to an INTEGER returns an integer and truncates any result value after the decimal.
- Example 2:
CAST(Rent AS FLOAT) + Utilities > 2000.45
Casting Rent, which is a CHARACTER data type, to a FLOAT data type and where Utilities is also a FLOAT data type.