Data compression

Applies to: yesSQL Server (all supported versions) YesAzure SQL Database YesAzure SQL Managed Instance

SQL Server, Azure SQL Database, and Azure SQL Managed Instance back up row and page pinch for rowstore tables and indexes, and support columnstore and columnstore archival compression for columnstore tables and indexes.

For rowstore tables and indexes, use the data compression characteristic to help reduce the size of the database. In improver to saving space, information compression can assist ameliorate operation of I/O intensive workloads considering the data is stored in fewer pages and queries need to read fewer pages from disk. However, extra CPU resources are required on the database server to compress and decompress the data, while data is exchanged with the awarding. Yous can configure row and page compression on the following database objects:

  • A whole table that is stored as a heap.
  • A whole table that is stored every bit a clustered alphabetize.
  • A whole nonclustered index.
  • A whole indexed view.
  • For partitioned tables and indexes, you tin configure the compression option for each partition, and the diverse partitions of an object do not have to have the same compression setting.

For columnstore tables and indexes, all columnstore tables and indexes e'er utilise columnstore pinch and this is not user configurable. Apply columnstore archival compression to further reduce the data size for situations when you tin afford actress time and CPU resources to store and retrieve the data. You can configure columnstore archival pinch on the following database objects:

  • A whole columnstore table or a whole clustered columnstore alphabetize. Since a columnstore table is stored as a amassed columnstore index, both approaches take the aforementioned results.
  • A whole nonclustered columnstore index.
  • For partitioned columnstore tables and columnstore indexes, yous tin configure the archival compression option for each partition, and the diverse partitions do not have to take the same archival pinch setting.

Note

Data can also be compressed using the GZIP algorithm format. This is an additional step and is almost suitable for compressing portions of the data when archiving erstwhile information for long-term storage. Information compressed using the COMPRESS role cannot exist indexed. For more information, see COMPRESS (Transact-SQL).

Row and page compression considerations

When you apply row and page compression, be aware the following considerations:

  • The details of information pinch are subject to change without notice in service packs or subsequent releases.

  • Pinch is bachelor in Azure SQL Database

  • Compression is not available in every edition of SQL Server. For more than information, encounter Features Supported by the Editions of SQL Server 2016, Editions and supported features of SQL Server 2017, and Editions and supported features of SQL Server 2019.

  • Compression is not available for system tables.

  • Compression tin allow more rows to be stored on a page, but does not modify the maximum row size of a tabular array or index.

  • A table cannot be enabled for compression when the maximum row size plus the pinch overhead exceeds the maximum row size of 8060 bytes. For example, a tabular array that has the columns c1 CHAR(8000) and c2 CHAR(53) cannot be compressed because of the additional compression overhead. When the vardecimal storage format is used, the row-size cheque is performed when the format is enabled. For row and page compression, the row-size check is performed when the object is initially compressed, and and then checked as each row is inserted or modified. Compression enforces the following two rules:

    • An update to a fixed-length blazon must always succeed.
    • Disabling data compression must always succeed. Even if the compressed row fits on the page, which ways that it is less than 8060 bytes; SQL Server prevents updates that would non fit on the row when it is uncompressed.
  • Off-row information is not compressed when enabling data compression. For instance, an XML record that's larger than 8060 bytes will use out-of-row pages, which are non compressed. - Several data types are not affected by data compression. For more detail, see How row compression affects storage.

  • When a list of partitions is specified, the compression type can be set to ROW, Folio, or NONE on individual partitions. If the list of partitions is not specified, all partitions are set with the data pinch belongings that is specified in the statement. When a table or alphabetize is created, data compression is set to NONE unless otherwise specified. When a table is modified, the existing compression is preserved unless otherwise specified.

  • If you specify a list of partitions or a sectionalization that is out of range, an mistake is generated.

  • Nonclustered indexes practice non inherit the compression holding of the table. To compress indexes, you must explicitly set the compression property of the indexes. By default, the compression setting for indexes is gear up to NONE when the alphabetize is created.

  • When a clustered alphabetize is created on a heap, the clustered index inherits the compression state of the heap unless an alternative compression state is specified.

  • When a heap is configured for folio-level compression, pages receive page-level pinch but in the post-obit ways:

    • Data is bulk imported with bulk optimizations enabled.
    • Data is inserted using INSERT INTO ... WITH (TABLOCK) syntax and the table does not have a nonclustered alphabetize.
    • A table is rebuilt past executing the ALTER Table ... REBUILD statement with the PAGE compression option.
  • New pages allocated in a heap as part of DML operations practice non use Page compression until the heap is rebuilt. Rebuild the heap by removing and reapplying pinch, or past creating and removing a clustered index.

  • Changing the pinch setting of a heap requires all nonclustered indexes on the table to be rebuilt and so that they have pointers to the new row locations in the heap.

  • You lot tin enable or disable ROW or Page compression online or offline. Enabling compression on a heap is single threaded for an online operation.

  • The disk space requirements for enabling or disabling row or page compression are the same as for creating or rebuilding an index. For partitioned data, yous tin reduce the space that is required by enabling or disabling compression for i partition at a time.

  • To determine the pinch land of partitions in a partitioned table, query the data_compression column of the sys.partitions catalog view.

  • When you are compressing indexes, leafage-level pages can be compressed with both row and page compression. Non-foliage-level pages practise not receive page compression.

  • Because of their size, large-value information types are sometimes stored separately from the normal row information on special purpose pages. Information compression is not bachelor for the data that is stored separately.

  • Tables that implemented the vardecimal storage format in SQL Server 2005 (9.x), retain that setting when upgraded. Yous can apply row compression to a table that has the vardecimal storage format. However, because row pinch is a superset of the vardecimal storage format, at that place is no reason to retain the vardecimal storage format. Decimal values gain no additional compression when yous combine the vardecimal storage format with row compression. You can apply folio compression to a tabular array that has the vardecimal storage format; however, the vardecimal storage format columns probably will not achieve additional compression.

    Notation

    All supported versions of SQL Server support the vardecimal storage format; however, considering information pinch achieves the same goals, the vardecimal storage format is deprecated. This feature will be removed in a time to come version of Microsoft SQL Server. Avert using this feature in new development work, and plan to alter applications that currently utilize this feature.

Columnstore and columnstore annal pinch

Columnstore tables and indexes are always stored with columnstore compression. You lot can further reduce the size of columnstore information by configuring an additional compression called archival pinch. To perform archival compression, SQL Server runs the Microsoft XPRESS compression algorithm on the data. Add together or remove archival compression by using the following information pinch types:

  • Use COLUMNSTORE_ARCHIVE information pinch to shrink columnstore information with archival pinch.
  • Use COLUMNSTORE data pinch to decompress archival compression. The resulting data go on to be compressed with columnstore compression.

To add archival compression, utilise ALTER Table (Transact-SQL) or ALTER Alphabetize (Transact-SQL) with the REBUILD pick and Information Pinch = COLUMNSTORE_ARCHIVE.

For example:

              ALTER Table ColumnstoreTable1 REBUILD PARTITION = 1 WITH (     DATA_COMPRESSION = COLUMNSTORE_ARCHIVE );  Change Tabular array ColumnstoreTable1 REBUILD PARTITION = ALL WITH (     DATA_COMPRESSION = COLUMNSTORE_ARCHIVE );  Alter TABLE ColumnstoreTable1 REBUILD Sectionalisation = ALL WITH (     DATA_COMPRESSION = COLUMNSTORE_ARCHIVE ON PARTITIONS (two, iv) );                          

To remove archival compression and restore the information to columnstore pinch, employ Alter TABLE (Transact-SQL) or Alter Alphabetize (Transact-SQL) with the REBUILD option and DATA Pinch = COLUMNSTORE.

For instance:

              ALTER Table ColumnstoreTable1 REBUILD Sectionalisation = 1 WITH (      DATA_COMPRESSION = COLUMNSTORE );  ALTER Table ColumnstoreTable1 REBUILD PARTITION = ALL WITH (     DATA_COMPRESSION = COLUMNSTORE );  ALTER TABLE ColumnstoreTable1 REBUILD PARTITION = ALL WITH (     DATA_COMPRESSION = COLUMNSTORE ON PARTITIONS (2, 4) );                          

This next example sets the data pinch to columnstore on some partitions, and to columnstore archival on other partitions.

              ALTER TABLE ColumnstoreTable1 REBUILD Sectionalisation = ALL WITH (     DATA_COMPRESSION = COLUMNSTORE         ON PARTITIONS (4, 5),     DATA COMPRESSION = COLUMNSTORE_ARCHIVE         ON PARTITIONS (i, 2, 3) );                          

Operation

Compressing columnstore indexes with archival compression, causes the index to perform slower than columnstore indexes that do non have the archival pinch. Employ archival compression merely when you can beget to utilize extra time and CPU resources to compress and retrieve the data.

The benefit of archival compression, is reduced storage, which is useful for data that is not accessed often. For case, if you take a partition for each month of data, and most of your action is for the near recent months, you could archive older months to reduce the storage requirements.

Metadata

The following system views contain information about information compression for clustered indexes:

  • sys.indexes (Transact-SQL) - The blazon and type_desc columns include CLUSTERED COLUMNSTORE and NONCLUSTERED COLUMNSTORE.
  • sys.partitions (Transact-SQL) - The data_compression and data_compression_desc columns include COLUMNSTORE and COLUMNSTORE_ARCHIVE.

The procedure sp_estimate_data_compression_savings (Transact-SQL) tin besides utilize to columnstore indexes.

Impact on partitioned tables and indexes

When you lot employ data pinch with partitioned tables and indexes, be enlightened of the post-obit considerations:

  • When partitions are split past using the Modify PARTITION statement, both partitions inherit the data pinch aspect of the original partition.

  • When two partitions are merged, the resultant sectionalisation inherits the data compression attribute of the destination partition.

  • To switch a sectionalization, the data compression property of the partition must match the compression property of the table.

  • There are two syntax variations that you tin use to alter the pinch of a partitioned table or index:

    • The post-obit syntax rebuilds only the referenced division:

                            ALTER TABLE <table_name> REBUILD PARTITION = ane WITH (     DATA_COMPRESSION = <option> );                                          
    • The following syntax rebuilds the whole table past using the existing compression setting for whatsoever partitions that are not referenced:

                            Modify TABLE <table_name> REBUILD Partition = ALL WITH (     DATA_COMPRESSION = PAGE ON PARTITIONS(<range>),     ... );                                          

    Partitioned indexes follow the same principle using ALTER Alphabetize.

  • When a clustered index is dropped, the respective heap partitions retain their data compression setting unless the partitioning scheme is modified. If the partitioning scheme is changed, all partitions are rebuilt to an uncompressed state. To drop a clustered alphabetize and change the segmentation scheme requires the following steps:

    1. Drib the amassed index.
    2. Modify the table by using the ALTER Tabular array ... REBUILD option that specifies the compression option.

    To drop a amassed index OFFLINE is a very fast operation, considering only the upper levels of clustered indexes are removed. When a amassed index is dropped ONLINE, SQL Server must rebuild the heap two times, once for stride one and once for step 2.

How compression affects replication

When you lot are using data compression with replication, be aware of the following considerations:

  • When the Snapshot Agent generates the initial schema script, the new schema uses the same compression settings for both the tabular array and its indexes. Compression cannot be enabled on just the table and not the index.

  • For transactional replication the article schema option determines what dependent objects and backdrop have to exist scripted. For more information, come across sp_addarticle.

    The Distribution Agent does not cheque for down-level Subscribers when it applies scripts. If the replication of compression is selected, creating the table on down-level Subscribers fails. In the case of a mixed topology, do not enable the replication of compression.

  • For merge replication, publication compatibility level overrides the schema options and determines the schema objects that are scripted.

    In the case of a mixed topology, if it is non required to back up the new compression options, the publication compatibility level should be set to the down-level Subscriber version. If it is required, shrink tables on the Subscriber after they have been created.

The following table shows replication settings that command pinch during replication.

User intent Replicate partition scheme for a tabular array or index Replicate compression settings Scripting behavior
To replicate the segmentation scheme and enable compression on the Subscriber on the partition. Truthful True Scripts both the partition scheme and the pinch settings.
To replicate the sectionalisation scheme but not compress the data on the Subscriber. True False Scripts out the partition scheme merely not the compression settings for the partition.
To non replicate the partition scheme and not compress the data on the Subscriber. False False Does not script partition or compression settings.
To shrink the table on the Subscriber if all the partitions are compressed on the Publisher, just non replicate the segmentation scheme. False True Checks if all the partitions are enabled for compression.

Scripts out compression at the table level.

Bear on on other SQL Server components

Applies to: yesSQL Server (all supported versions) YesAzure SQL Database YesAzure SQL Managed Instance

Compression occurs in the storage engine and the information is presented to most of the other components of SQL Server in an uncompressed state. This limits the furnishings of compression on the other components to the following:

  • Bulk import and consign operations
    • When data is exported, even in native format, the data is output in the uncompressed row format. This can cause the size of exported data file to be significantly larger than the source information.
    • When data is imported, if the target table has been enabled for compression, the information is converted by the storage engine into compressed row format. This tin can crusade increased CPU usage compared to when data is imported into an uncompressed tabular array.
    • When information is bulk imported into a heap with page compression, the bulk import functioning tries to shrink the data with page compression when the information is inserted.
  • Pinch does not affect backup and restore.
  • Pinch does non affect log shipping.
  • Data pinch is incompatible with sparse columns. Therefore, tables containing thin columns cannot be compressed nor can sparse columns be added to a compressed table.
  • Enabling compression can cause query plans to change because the information is stored using a unlike number of pages and number of rows per page.

See too

  • Row Pinch Implementation
  • Folio Compression Implementation
  • Unicode Pinch Implementation
  • CREATE PARTITION SCHEME (Transact-SQL)
  • CREATE PARTITION Office (Transact-SQL)
  • CREATE Tabular array (Transact-SQL)
  • ALTER TABLE (Transact-SQL)
  • CREATE Index (Transact-SQL)
  • Change Alphabetize (Transact-SQL)