Classification is a tag 🏷️ that helps you group data assets of similar access policies.
We don't deny that classifying each data asset can be a bit daunting. To make the process easier, say hi 👋 to the Atlan Bot 🤖
The Atlan Bot supports you by intelligently identifying data assets with Personal Identifiable Information (PII), and then attaching the PII classification and the access policies.
Let's see how 👀
The Atlan Bot uses specific algorithms to auto-detect Personal Identifiable Information (PII).
It first checks the column metadata like column headers against our internal PII terms master database for PII terms like a credit card, bank account, etc. If the matching (Levenshtein) score of the column header & any PII term in master data is above a threshold value, then it gets tagged as PII
As column headers are not always explicit, the Atlan Bot also checks for patterns inside the column values. It checks sample values of the column for the presence of any type of PII value (like credit card numbers, email addresses, etc).
For example, to detect credit card numbers, we have converted pattern guidelines given by credit card providers like Visa, Mastercard, AMEX, and others around the world into regular expressions that machines can understand and use for detection.
If any of the above two methods indicate PII then that asset gets tagged under PII classification.
Go to the Discover list, and click the name of the data table you want to auto-classify.
Click the third tab from the right, labeled "Profile".
This blue button is on the right-hand side. As soon as you click it, a configuration set-up modal will open.
In the set-up modal, choose "Yes" for the auto-classify option, and click "Update".
The data quality profile will then run and auto-classify the table's columns as PII if they match the conditions given in the bot's algorithm (see the previous section).
Want to know more about classification? Read the article on this topic below 👇