In Assette, it is sometimes necessary to ensure that blank or NULL-valued columns do not appear in an output when preparing data for a report or presentation. This operation must be performed at the Data Block level, not in the Data Object. The approach involves manipulating the data using Python and the pandas library, specifically targeting column removal prior to Smart Shell processing.
This article outlines the correct method for removing such columns while preserving the behavior of the Data Object settings.
Important Distinction: Data Block vs. Data Object #
It is essential to understand that this cleanup must occur in the Data Block, not in the Data Object.
The Data Object includes a setting in the Columns tab labeled “Drop if not available in data source”, which controls whether a column is removed when the data source itself is missing the column entirely. However, this setting does not remove columns that exist in the Data Block but have NULL or blank values. If this setting is unchecked and the Smart Shell references the column, the column will be included even if it does not contain meaningful data.
To fully exclude NULL or blank columns from the output, developers must perform cleanup within the Data Block itself.
How to Remove Blank or NULL Columns in the Data Block #
To programmatically drop columns that contain only NULL or blank values from a Data Block, use the following pandas code pattern:
import pandas as pd
# Assume `df` is the pandas DataFrame that represents your Data Block data
# Drop columns where all values are NULL
df = df.dropna(axis=1, how='all')
# Optionally, drop columns where all values are blank or whitespace
df = df.loc[:, ~(df.apply(lambda col: col.astype(str).str.strip()).eq("").all())]
This approach ensures that:
- Columns entirely consisting of NaN values are removed.
- Columns that contain only blank strings (e.g., ” “, “”, etc.) are also removed.
Interaction with the “Drop if not available in data source” Data Object Setting #
The “Drop if not available in data source” setting in the Data Object applies only when a column is missing from the Data Block’s output entirely. If the column exists but contains only NULLs or blanks, that setting does not drop the column—because from the system’s perspective, the column exists.
This distinction highlights the need to programmatically remove NULL/blank columns from the Data Block itself to ensure they are truly absent when the Data Object and Smart Shell process the data.
Using a Data Setting to Control Dropping NULL Columns #
In certain highly specific reporting scenarios, it may be necessary to conditionally remove blank columns depending on the context of the output. For example, business rules for multi-horizon performance tables may require omitting NULL or N/A columns in Smart Docs used for newly incepted accounts, while retaining them in other views.
To accommodate this, developers can introduce a Data Setting to dynamically control whether NULL columns should be dropped. This allows Smart Docs or Smart Pages to selectively apply the cleanup logic depending on runtime conditions.
Example Use Case #
In performance tables tailored to newly onboarded accounts, you might suppress columns such as “1-Year Return” or “Since Inception” if those values are not available yet. In contrast, the same table for long-standing accounts might retain all columns.
How to Implement #
- Define a Data Setting in the Data Object, such as
dropNullColumns
, with values like"yes"
or"no"
depending on the Smart Page or Smart Doc context. - Add conditional logic to the Data Block that evaluates this setting and applies the cleanup as needed.
Here is an example of how this can be implemented in the Data Block code:
# Assume `df` is the DataFrame representing the Data Block data
if dropNullColumns == "yes":
df = df.dropna(how='any', axis=0)
This approach enables greater flexibility and supports specialized reporting logic. However, it is important to note that using data settings to control column cleanup is not a common practice and should only be used in narrowly scoped situations where this level of control is required.
Best practice remains to perform column cleanup directly in the Data Block without relying on runtime toggles.
Best Practices #
- Perform column cleanup in your Python logic or preprocessing script at the Data Block level.
- Do not rely solely on the Data Object setting to control blank or missing columns.
- Always test your output in Smart Preview or in the final report to confirm that unwanted columns have been excluded.