With InterSystems IRIS 2022.2, we introduced Columnar Storage as a new option for persisting your IRIS SQL tables that can boost your analytical queries by an order of magnitude. The capability is marked as experimental in 2022.2 and 2022.3, but will "graduate" to a fully supported production capability in the upcoming 2023.1 release.
The product documentation and this introductory video, already describe the differences between row storage, still the default on IRIS and used throughout our customer base, and columnar table storage and provide high-level guidance on choosing the appropriate storage layout for your use case. In this article, we'll elaborate on this subject and share some recommendations based on industry-practice modelling principles, internal testing, and feedback from Early Access Program participants.
Generally, our guidance on choosing an appropriate table layout for your IRIS SQL schema is as follows:
- If you’re deploying an application that leverages IRIS SQL or Objects, such as an EHR, ERP or transaction processing application, there is no need to change its current row storage layout to a columnar one. Most SQL queries issued for end user applications or programmatic transactions only retrieve or update a limited number of rows, and result rows usually correspond to table rows, with very limited use of aggregate functions. In such cases, the benefits offered by columnar storage and vectorized query processing don’t apply.
- If such an application also embeds operational analytics, consider adding columnar indices if the corresponding analytical queries’ current performance is not satisfactory. This includes, for example, dashboards showing the current inventory or basic financial reporting on live data. Look for numeric fields used in aggregations (e.g. quantities, currencies) or high-cardinality fields used in range conditions (e.g. timestamps). A good indicator for such opportunities is current use of bitmap indices to speed up the filtering of large numbers of rows, usually on low-cardinality fields (e.g. categorical or ordinal fields). There is no need to replace these bitmap indices; the additional columnar indices work well in conjunction with them and are meant to avoid excessive reads from the master map or regular index maps (single gref per row).
- If your IRIS SQL tables contain less than a million rows, there is no need to consider columnar storage. We prefer not to pin ourselves to specific numbers, but the benefits of vectorized query processing are unlikely to make a difference in these low ranges.
- If you’re deploying an IRIS SQL schema for Data Warehouse, Business Intelligence, or similar analytical use cases, consider changing it to default to columnar storage. Star schemas, snowflake schemas or other denormalized table structures as well as broad use of bitmap indices and batch ingestion are good indicators for these use cases. Analytical queries that will benefit most from columnar storage are those that scan large numbers of rows and aggregate values across them. When defining a “columnar table”, IRIS will transparently resort to a row layout for columns in that table that aren’t a good fit for columnar storage, such as streams, long strings or serial fields. IRIS SQL fully supports such mixed table layouts and will use vectorized query processing for eligible parts of the query plan. The added value of bitmap indices on columnar tables is limited, so they can be left out.
Mileage will vary based on both environmental and data-related parameters. Therefore, we highly recommend customers test the different layouts in a representative setup. Columnar indices are easy to add to a regular row-organized table and will quickly yield a realistic perspective on query performance benefits. This, along with the flexibility of mixed table layouts, is a key differentiator of InterSystems IRIS that helps customers achieve an order-of-magnitude performance improvement.
We intend to make these recommendations more concrete as we get more real-world experience on the full production release. Obviously, we can provide more concrete advice based on customers’ actual schema and workload through the Early Access Program and POC engagements, and look forward to feedback from customers and community members. Columnar Storage is part of the InterSystems IRIS Advanced Server license and also enabled in the Community Edition of InterSystems IRIS and IRIS for Health. For a fully scripted demo environment, please refer to this GitHub repository.