Traverse categories
You can populate glossaries in Atlan with arbitrarily deep category hierarchies.
To traverse these categories efficiently (without retrieving each level through a separate API call) you need to search for all categories in a glossary and reconstruct the hierarchy in-memory. This reconstruction can be cumbersome, so we've provided a helper method for that in the SDKs.
Retrieve hierarchy
To retrieve a traversable hierarchy for a glossary:
- Java
- Python
- Kotlin
- Raw REST API
Glossary glossary = Glossary.findByName(client, "Concepts"); // (1)
Glossary.CategoryHierarchy tree = glossary.getHierarchy(client); // (2)
- Start by retrieving the glossary itself, for example using
Glossary.findByName(). The glossary object used must have itsqualifiedNamepresent, so if you already know thequalifiedNameyou could also useGlossary._internal().qualifiedName("...").build();as a shortcut, which doesn't require making any API call. Because this operation will lookup the asset in Atlan, you must provide it anAtlanClientthrough which to connect to the tenant. - Call the
.getHierarchy()method on the glossary to retrieve a traversableGlossary.CategoryHierarchyobject. Because this operation will lookup the asset in Atlan, you must provide it anAtlanClientthrough which to connect to the tenant.
More details
The .getHierarchy() method will only retrieve the bare minimum information about each category (its GUID, qualifiedName and name). If you want to retrieve additional details, such as the terms in that category or certificate for the category, you need to pass these as an additional argument. To do this, use the .getHierarchy(AtlanClient, List<String>) method, and pass a list of strings giving the names of any additional attributes you want to retrieve for each category. (For example, to retrieve terms you would use terms, for certificates you would use certificateStatus.)
from pyatlan.client.atlan import AtlanClient
client = AtlanClient()
glossary = client.asset.find_glossary_by_name("Concepts") # (1)
hierarchy = client.asset.get_hierarchy(glossary) # (2)
- Start by retrieving the glossary itself, for example using
find_glossary_by_name(). The glossary object used must have itsqualified_namepresent. - Call the
get_hierarchy()to retrieve a traversableAtlasGlossary.CategoryHierarchyobject.
More details
The .get_hierarchy() method will only retrieve the bare minimum information about each category (its GUID, qualifiedName and name). If you want to retrieve additional details, such as the terms in that category or certificate for the category, you need to pass these as an additional argument. To do this, add the additional attributes parameter and pass a list of strings giving the names of any additional attributes ou want to retrieve for each category. (For example, to retrieve terms you would use terms, for certificates you would use certificateStatus.)
val glossary = Glossary.findByName(client, "Concepts") // (1)
val tree = glossary.getHierarchy(client) // (2)
- Start by retrieving the glossary itself, for example using
Glossary.findByName(). The glossary object used must have itsqualifiedNamepresent, so if you already know thequalifiedNameyou could also useGlossary._internal().qualifiedName("...").build();as a shortcut, which doesn't require making any API call. Because this operation will lookup the asset in Atlan, you must provide it anAtlanClientthrough which to connect to the tenant. - Call the
.getHierarchy()method on the glossary to retrieve a traversableGlossary.CategoryHierarchyobject. Because this operation will lookup the asset in Atlan, you must provide it anAtlanClientthrough which to connect to the tenant.
More details
The .getHierarchy() method will only retrieve the bare minimum information about each category (its GUID, qualifiedName and name). If you want to retrieve additional details, such as the terms in that category or certificate for the category, you need to pass these as an additional argument. To do this, use the .getHierarchy(AtlanClient, List<String>) method, and pass a list of strings giving the names of any additional attributes you want to retrieve for each category. (For example, to retrieve terms you would use terms, for certificates you would use certificateStatus.)
To retrieve all categories in a glossary could require multiple API operations, to page through results. You would do this by incrementing the from in each subsequent call (in increments equal to the size) to get the next page of results.
Each page of results from the search will return a flat list of categories. You will need to use the parentCategory relationship within each result to reverse-engineer the hierarchical structure of the categories from the flat lists.
{
"dsl": { // (1)
"from": 0, // (2)
"size": 20, // (3)
"query": {
"bool": {
"filter": [
{
"term": { // (4)
"__state": {
"value": "ACTIVE"
}
}
},
{
"term": { // (5)
"__typeName.keyword": {
"value": "AtlasGlossaryCategory"
}
}
},
{
"term": { // (6)
"__glossary": {
"value": "LD5Tb30qbuYCZKsmFRpmS"
}
}
}
]
}
},
"sort": [ // (7)
}
],
"track_total_hits": true
},
"attributes": [
"parentCategory" // (8)
],
"suppressLogs": true,
"showSearchScore": false,
"excludeMeanings": false,
"excludeClassifications": false
}
-
You should run a search to efficiently retrieve many categories at the same time.
-
Use the
fromparameter to define the start of each page. If you have many categories in the glossary, page through them rather than trying to retrieve them all in a single request. Thefromshould be incremented in multiples of thesize, so in this example would be0,20,40, and so on. -
The
sizeparameter controls how many categories you will try to retrieve per search request. -
You will probably want to filter the categories to only those that are active (excluding any archived or soft-deleted categories).
-
You should filter the search by a specific type, in this example
AtlasGlossaryCategoryis the name of the type in Atlan for categories. -
Finally, you should also filter the search for the specific glossary in which to find the categories.
Requires qualifiedName of the glossary
Note that this requires the qualifiedName of the glossary, which therefore must first be known or found by an earlier search on glossaries. ::: 7. When you expect to page through results, it's always a good idea to sort the results so that each page returns them in a consistent order. 8. Since we want to be able to understand the hierarchy of categories, we also need to include the parentCategory` in each result.
Traverse hierarchy
To traverse the hierarchy of categories you then have a few options.
Depth-first traversal
To list every category in the hierarchy in depth-first order:
- Java
- Python
- Kotlin
- Raw REST API
List<IGlossaryCategory> dfs = tree.depthFirst(); // (1)
for (GlossaryCategory category : dfs)
- The
.depthFirst()method will return an ordered list of all the categories in the glossary, ordered by a depth-first traversal. - You can then iterate through them in this particular order.
for category in hierarchy.depth_first: # (1)
... # Do something with the category
... # Order [1, 1a, 1a(i), 1a(ii), 1b, 2]
- The
depth_firstproperty will return an ordered list of all the categories in the glossary, ordered by a depth-first traversal. You can then iterate through them in this particular order.
val dfs = tree.depthFirst() // (1)
for (category in dfs)
- The
.depthFirst()method will return an ordered list of all the categories in the glossary, ordered by a depth-first traversal. - You can then iterate through them in this particular order.
Once you have retrieved the categories using the search approach outlined above, traversing them becomes an operation entirely in your own program (doesn't interact with Atlan APIs).
For a depth-first traversal:
- Start by listing a single top-level category (those whose
parentCategoryrelationship is empty). - Then output a single child category of that top-level category.
- Then output a single child category of (2).
- Continue in this way down the hierarchy.
- Once exhausted, then move on to the next (grand-)child category and exhaust its (grand-)children.
- Continue in this way until all categories are listed.
Breadth-first traversal
To list every category in the hierarchy in breadth-first order:
- Java
- Python
- Kotlin
- Raw REST API
List<IGlossaryCategory> bfs = tree.breadthFirst(); // (1)
for (GlossaryCategory category : bfs)
- The
.breadthFirst()method will return an ordered list of all the categories in the glossary, ordered by a breadth-first traversal. - You can then iterate through them in this particular order.
for category in hierarchy.breadth_first: # (1)
... # Do something with the category
... # Order [1, 1a, 1a(i), 1a(ii), 1b, 2]
- The
breadth-firstproperty will return an ordered list of all the categories in the glossary, ordered by a depth-first traversal. You can then iterate through them in this particular order.
val bfs = tree.breadthFirst() // (1)
for (category in bfs)
- The
.breadthFirst()method will return an ordered list of all the categories in the glossary, ordered by a breadth-first traversal. - You can then iterate through them in this particular order.
Once you have retrieved the categories using the search approach outlined above, traversing them becomes an operation entirely in your own program (doesn't interact with Atlan APIs).
For a breadth-first traversal:
- Start by listing the top-level categories (those whose
parentCategoryrelationship is empty). - For each of these categories, then list all of its children.
- Continue the logic from (1) for each child category.
Build-your-own traversal
Alternatively, you may want to iterate through the hierarchy in your own order. From the traversable hierarchy you can retrieve the top-level categories, and then decide what to do from there:
- Java
- Python
- Kotlin
- Raw REST API
for (IGlossaryCategory top : tree.getRootCategories())
}
}
- The
.getRootCategories()method will return a list of only those categories at the root of the glossary. (The categories that have no parent categories themselves.) - You can then retrieve the child categories using
.getChildrenCategories(). And you can do this iteratively as you traverse down the hierarchy.
for top in hierarchy.root_categories: # (1)
for child in top.children_categories or []: # (2)
for gc in child.children_categories or []:
... # Do something with the grand-children categories [1a(i), 1a(ii)]
... # ... and so on
- The
root_categoriesproperty will return a list of only those categories at the root of the glossary. (The categories that have no parent categories themselves.) - You can then retrieve the child categories using
children_categoriesproperty. And you can do this iteratively as you traverse down the hierarchy.
for (top in tree.rootCategories)
}
}
- The
.rootCategoriesmember will return a list of only those categories at the root of the glossary. (The categories that have no parent categories themselves.) - You can then retrieve the child categories using
.childrenCategories. And you can do this iteratively as you traverse down the hierarchy.
Once you have retrieved the categories using the search approach outlined above, traversing them becomes an operation entirely in your own program (doesn't interact with Atlan APIs).