A look into Snowflake Data Types

A look into Snowflake Data Types

As a Database as a Service (DBaaS), Snowflake is a relational Cloud Data Warehouse that can be accessed online. This Data Warehouse can give your company more leeway to adapt to shifting market conditions and grow as needed. Its Cloud Storage is powerful enough to accommodate endless volumes of both structured and semi-structured data. As a result, information from numerous sources can be combined. In addition, the Snowflake Data Warehouse will prevent your company from needing to buy extra hardware.

Snowflake allows you to use the usual SQL data types in your columns, local variables, expressions, and parameters (with certain limitations). An identifier and data type will be assigned to each column in a table. The data type tells Snowflake how much space to set aside for a column’s storage and what form the data must take.

Snowflake’s great global success can be attributed to the following characteristics: 

    • Snowflake’s scalability stems from the fact that it provides storage facilities independent of its computation facilities. Data is stored in a database and processed in a virtual data warehouse. As a result, Snowflake guarantees excellent scalability at a low cost.
    • Snowflake requires little upkeep because it was made with the user in mind. It has a low barrier to entry and needs little in the way of upkeep.
    • Automated query optimization is supported in Snowflake, saving you time and effort over the hassle of improving queries manually.
    • Snowflake allows you to divide your company’s regular workloads into different virtual Data Warehouses. As a result, this facilitates Data Analytics management, particularly under extremely regular loads.

Six Important Snowflake Data Types

The first step in becoming a Snowflake Data Warehouse expert is learning the ins and outs of the different types of data it stores. There are 6 different kinds of data that can be used with Snowflake.

    1. Numeric Data Types
    2. String & Binary Data Types
    3. Logical Data Types
    4. Date & Time Data Types
    5. Semi-structured Data Types
    6. Geospatial Data Types

1) Numeric Data Types

Knowing what precision and scale are is crucial before diving into the various sorts of numeric data types. 

    • A number’s precision is the maximum number of significant digits that can be included in the number itself.
    • Scale is the maximum number of digits that can be displayed following a decimal point.

Precision has no effect on storage; for example, the same number in columns with different precisions, such as NUMBER(5,0) and NUMBER(25,0), will have the same storage requirements. However, the scale has an effect on storage; for example, the same data saved in a column of type NUMBER(20,5) requires more space than NUMBER(20,0). Additionally, processing bigger scale values may take a little more time and space in memory.

So here are a few types of numeric data types:

    • NUMBER is a data type for storing whole numbers. The default scale and precision settings are 0 and 38, respectively.
    • DECIMAL and NUMERIC are the same as NUMBER.
    • The prefixes INT, INTEGER, BIGINT, and SMALLINT all mean the same thing as NUMBER. But you can’t change the scale or precision; these serial data types are permanently stuck at 0 and 38.
    • Snowflake uses double-precision IEEE 754 floating-point values (FLOAT, FLOAT4, FLOAT8). 
    • FLOAT is a synonym for DOUBLE, DOUBLE PRECISION, and REAL.
    • Numeric Constants are numbers that have fixed values. It supports the following format:

2) String & Binary Data Types

The following character-related data types are supported in Snowflake:

    • With a maximum size of 16 MB, VARCHAR can store Unicode characters of any size. There are BI/ETL tools that can set the maximum allowed length of VARCHAR data before storing or retrieving it.
    • CHARACTER, CHAR is like  VARCHAR, but with the default length as VARCHAR(1).
    • If you’re familiar with VARCHAR, you’ll feel right at home with STRING.
    • Just like VARCHAR, TEXT can store any kind of character.
    • The BINARY data type does not understand Unicode characters; hence its size is always expressed in bytes rather than characters. There’s an upper limit of 8 MB.
    • To put it simply, VARBINARY is another name for BINARY.
    • String Constants are fixed values. When using Snowflake, string constants must always be separated by delimiter characters. Delimiting string literals in Snowflake can be done with either single quotes or dollar signs.

3) Logical Data Types

In logical data type, you can only use BOOLEAN with one of two values: TRUE or FALSE. Sometimes it will show up as NULL if the value is unknown. The BOOLEAN data type offers the necessary Ternary Logic functionality.

SQL requires using a ternary logic, often known as three-valued logic (3VL), which has three possible truth values (TRUE, FALSE, and UNKNOWN). To indicate the unknown value in Snowflake, NULL is used. The outcomes of logical operations like AND, OR, and NOT are affected by ternary logic when applied to the evaluation of Boolean expressions and predicates.

    • UNKNOWN values are interpreted as NULL when used in expressions (like a SELECT list).
    • Use of UNKNOWN as a predicate (in a WHERE clause, for example) always returns FALSE

4) Date & Time Data Types

This details the date/time and time data types that can be managed in Snowflake. It also explains the allowed formats for string constants to manipulate dates, times, and timestamps.

    • The DATE data type is supported in Snowflake (with no time elements). It supports the most typical dates format (YYYY-MM-DD, DD-MON-YYYY, etc.).
    • DATETIME is shorthand for TIMESTAMP NTZ.
    • A TIME data type represented as HH:MM: SS is supported by Snowflake. Additionally, a precision setting for fractional seconds is available. The default precision is 9. The valid range for All-TIME values is between 00:00:00 to 23:59:59.999999999. 
    • An alternative name for any of the TIMESTAMP_* functions is TIMESTAMP, which can be set by the user. The TIMESTAMP_* variant is used in place of TIMESTAMP whenever possible. This data type is not stored in tables.
    • Snowflake supports three different timestamp formats: TIMESTAMP LTZ, TIMESTAMP NTZ, and TIMESTAMP TZ.

       

      • The TIMESTAMP LTZ function accurately records UTC. The TIMEZONE session parameter determines the time zone in which each operation is executed.
      • TIMESTAMP NTZ accurately records wallclock time. Without regard to local time, all tasks are carried out.
      • By default, TIMESTAMP TZ stores UTC time plus the appropriate time zone offset. The session time zone offset will be utilized if the time zone is not specified.

5) Semi-Structured Data Types

Semi-structured data formats, such as JSON, Avro, ORC, Parquet, or XML, stand in for free-form data structures and are used to load and process data. To maximize performance and efficiency, Snowflake stores these in a compressed columnar binary representation internally.

    • VARIANT is a generic data type that can hold information of any other type, including OBJECT and ARRAY. Its 16 MB of storage space makes it perfect for archiving large files.
    • OBJECT comes in handy to save collections of key-value pairs, where the key is always a non-empty string and the value is always a VARIANT. Explicitly-typed objects are currently not supported in Snowflake.
    • Display both sparse and dense arrays of any size with ARRAY. The values are of the VARIANT type, and indices can be any positive integer up to 2^31-1. Arrays of a fixed size or containing values of a non-VARIANT type are not currently supported in Snowflake.

6) Geospatial Data Types

Snowflake has built-in support for geographic elements like points, lines, and polygons. The GEOGRAPHY data type, which Snowflake provides, treats Earth as though it were a perfect sphere. It is aligned with WGS 84 standards.

Degrees of longitude (from -180 to +180) and latitude (from -90 to +90) are used to locate points on Earth’s surface. As of right now, altitude is not a supported option.  More so, Snowflake provides GEOGRAPHY data-type-specific geographic functions.

Instead of retaining geographical data in their native formats in VARCHAR, VARIANT, or NUMBER columns, you should transform and save this data in GEOGRAPHY columns. The efficiency of geographical queries can be greatly enhanced if data is stored in GEOGRAPHY columns.

The following geospatial objects are compatible with the GEOGRAPHY data type:

    • Point
    • MultiPoint
    • MultiLineString
    • LineString
    • GeometryCollection
    • Polygon
    • MultiPolygon
    • Feature
    • FeatureCollection

Unsupported Data Types

If the above list of SQL server data types is clear, then what is the type of data that is incompatible with Snowflake? Here is your answer.

  • LOB (Large Object) 
    • BLOB: You can also utilize BINARY, with a maximum size of 8,388,608 bytes. 
    • CLOB: You can also use VARCHAR, with a maximum size of 16,777,216 bytes (for a single byte).
  • Other
    • ENUM
    • User-defined data types

Conclusion

While your primary focus should be on learning how to use customer data, you may be questioning why it’s necessary to know so many different data types. There is one motive for doing this, and that is to amass reliable information. Data collection and instrumentation aren’t the only areas where you can use your data type knowledge; you’ll also find that data administration, data integration, and developing internal applications are much less of a challenge now that you have a firm grasp on the topic.

Also, without a good database management system, it is impossible to deal with the massive amounts of data already in existence. Get in touch with our experts for more information.