Download Set Analysis Primer

x

Subset Ratio and Information Density

Three important visualizations or data storytelling aspects are quality, quantity, and manner.

Quality = Data Quality

Quantity = Data Volume and Required Number of Fields

Manner = Using the most appropriate chart form for compelling storytelling

For data quality in QlikView or Qlik Sense application, as a Qlik developer, you have to pay attention to subset ratio and information density.

Let’s start with the subset ratio.

Since a key field exists in multiple tables, the subset ratio provides insights into the key field values related to a specific table.

Let’s compare the CustomerID field in the Customers and Facts tables.

Customers Table
Fact table

The Customers table shows a 100% subset ratio, whereas the Facts table shows 97.8%

The subset ratio for a given table shows how many unique values of the key field exist in a particular table.

Subset Ratio = distinct values of a given key field present in a given table / total distinct values of a given key field.

 

SUBSET RATIO

Since all 91 unique customer values exist in the CustomerID field in the Customers table, its subset ratio is 100%. Whereas only 89 unique customer values exist in the CustomerID field in the Fact table, its subset ratio is 97.8%.

The 89% subset ratio in the fact table indicates that two customers have never placed any orders in a layman’s term.

It’s common to have less than a 100% subset ratio in the fact table, but it warrants further data quality check when the subset ratio is too low.

If you associate two fields either by aliasing or by having a common field name between tables, and if the subset ratio is 0% in a key field in one of the tables, the key field showing a 0% subset ratio has no present distinct values. The key field in that table may have all null values. Again, you have to investigate to find the culprit but the subset ratio in QlikView or Qlik Sense data model sheds light on the possible data quality issue.

JOINS

The subset ratio is also helpful in determining a proper join type if you decide to join two tables.

For example, if the subset ratio of a key field is 100% in both tables, you can use an inner join to join these two tables since the key field has all unique values present in both tables.

Conversely, if the subset ratio of a given key field, CustomerID, is 100% in the Customers table.  And the subset ratio for the same field is 89% in the fact table — assuming that the fact table load occurred before the Customers table —  you can’t use Left Join if your users expect to see two customers with zero sales value in a table.

Since Primary/Foreign key relationships don’t exist in QlikView or Qlik Sense data model, viewing the subset ratio provides essential insights regarding data quality.

INFORMATION DENSITY

Information density shows the number of records with non-null values in a field compared to the total number of records in a table. You should expect 100% information density for the key fields as a key field with null values will mislead end-users when they create visualizations in a self-service Qlik Sene application.

It’s important to note that if a key field has all null values, both the subset ratio and information density will be zero since the subset ratio calculation ignores the null values.

-Shilpan

If you enjoyed this article, get email updates (it’s free).

* indicates required
About the author

Leave a Reply