Skip to main content

Manage column profiling information

Profiling gives additional context to columns in relational stores. From profiling, you can see various summarized information such as:

  • numerical statistics (min, max, mean, median, standard deviation, sum, variance) for numeric columns
  • minimum, maximum, and average lengths for string columns
  • distinct value counts and percentages
  • missing value counts and percentages
Profiling is only available on columns

You will only be able to populate this summary information on columns, not on other assets in Atlan.

Retrieve profiles

Since profiles are only available on columns, you will need to retrieve column assets to see the profiles:

Retrieve profiles
Column column = Column.get(client, // (1)
"default/hive/1657025257/OPS/DEFAULT/RUN_STATS/STATUS", true); // (2)
column.getColumnDistinctValuesCount(); // (3)
column.getColumnUniqueValuesCount();
column.getColumnUniquenessPercentage();
column.getColumnDuplicateValuesCount();
column.getColumnMissingValuesCount();
column.getColumnMissingValuesPercentage();
column.getColumnMax(); // (4)
column.getColumnMin();
column.getColumnMean();
column.getColumnMedian();
column.getColumnStandardDeviation();
column.getColumnVariance();
column.getColumnSum();
column.getColumnMinimumStringLength(); // (5)
column.getColumnMaximumStringLength();
column.getColumnAverageLength();
  1. Use the get() method to retrieve all details about a specific column. Because this operation will retrieve the asset from Atlan, you must provide it an AtlanClient through which to connect to the tenant.
  2. Provide the full, case-sensitive qualifiedName of the column.
  3. Some profile information is common, regardless of the data type of the column.
  4. Some profile information is specific to numeric columns.
  5. Some profile information is specific to string columns.

Add your own profiles

In cases where Atlan doesn't profile the source, you may want to add your own profiles. You can do this by either adding the profile when creating the column (programmatically) or by updating the attributes on an existing column:

Add or update profiles
Column column = Column.updater( // (1)
"default/hive/1657025257/OPS/DEFAULT/RUN_STATS/STATUS", // (2)
"STATUS") // (3)
.columnDistinctValuesCount(123) // (4)
.columnUniqueValuesCount(123)
.columnUniquenessPercentage(50.0)
.columnDuplicateValuesCount(123)
.columnMissingValuesCount(123)
.columnMissingValuesPercentage(50.0)
.columnMax(321.0) // (5)
.columnMin(1.0)
.columnMean(123.0)
.columnMedian(123.0)
.columnStandardDeviation(3.0)
.columnVariance(1.0)
.columnSum(654321.0)
.columnMinimumStringLength(0) // (6)
.columnMaximumStringLength(123)
.columnAverageLength(123.0)
.build(); // (7)
AssetMutationResponse response = column.save(client); // (8)
  1. Use the updater() method to update an existing column asset (for more details, see Updating an asset).
  2. Provide the full, case-sensitive qualifiedName of the column.
  3. Provide the case-sensitive name of the column.
  4. Some profile information is common, regardless of the data type of the column. All are optional, so fill in only the pieces you want or for which you have the information.
  5. Some profile information is specific to numeric columns. All are optional, so fill in only the pieces you want or for which you have the information.
  6. Some profile information is specific to string columns. All are optional, so fill in only the pieces you want or for which you have the information.
  7. Use the build() method to construct the column object to be updated in Atlan.
  8. Then call the save() method against this built-up object to actually apply the update to Atlan. Because this operation will persist the asset in Atlan, you must provide it an AtlanClient through which to connect to the tenant.
Was this page helpful?