Skip to main content

Build your first metadata workflow

tip

This walkthrough takes you from connecting to Atlan through reading and updating metadata—in one guided session. You work with real SDK methods, understand why they work the way they do, and finish with the core patterns you use in almost every integration you build.

Connect to Atlan

Every SDK operation starts with an AtlanClient. Create one by providing your tenant URL and an API token:

The SDK is available on Maven Central, ready to be included in your project:

build.gradle.kts
repositories {
mavenCentral()
}

dependencies {
implementation("com.atlan:atlan-java:+") // (1)
testRuntimeOnly("ch.qos.logback:logback-classic:1.2.11") // (2)
}
  1. Include the latest version of the Java SDK in your project as a dependency. You can also give a specific version instead of the +, if you'd like.
  2. The Java SDK uses slf4j for logging purposes. You can include logback as a simple binding mechanism to send any logging information out to your console (standard out).

Provide two values to create an Atlan client:

AtlanLiveTest.java
import com.atlan.AtlanClient;

public class AtlanLiveTest {
public static void main(String[] args)
}
}
  1. Provide your Atlan tenant URL as the first parameter. You can also read the value from an environment variable, if you leave out both parameters.
  2. Provide your API token as the second parameter. You can also read the value from another environment variable, by leaving out this parameter.
  3. You can then start writing some actual code to run within a static main method. (Examples appear further in this tutorial.) Once the block is complete, any resources held by the client (that is, for caching) are automatically released.
Set up logging for SDK

You can also checkout to the advanced configuration section of the SDK to learn about how to set up logging.

Understand assets and identifiers

Before writing any retrieval or update code, you need two mental models: what an asset is, and how Atlan uniquely identifies each one. These concepts underpin every SDK operation.

Assets

In Atlan, every object that provides context to your data is called an asset.

Each type of asset has a set of:

  • Properties, such as:

    • Certificates
    • Announcements
  • Relationships to other assets, such as:

    • Schema child tables
    • Table parent schema
    • Table child columns
    • Column parent table
Assets are instances of metadata.

In an object-oriented programming sense, think of an asset as an instance of a class. The structure of an asset (the class itself, in this analogy) is defined by something called a type definition, but that's for another day.

There are many different kinds of assets—tables, columns, schemas, databases, BI dashboards, reports, and more. Assets inter-relate with each other and share common properties (like certificates) while also having properties unique to their type (like columnCount, which only exists on tables, not on schemas or databases).

Every asset has two identifiers

Every operation that reads or writes an asset requires an identifier. Atlan uses two, and understanding the difference between them is important before you write any update code:

GUID

Atlan uses globally-unique identifiers (GUIDs) to uniquely identify each asset, globally. They look something like this:

17f0356e-75f6-4e0b-8b05-32cebe8cd953

As the name implies, GUIDs are:

  • Globally unique (across all systems).

They're:

  • Generated in a way that makes it nearly impossible for anything else to ever generate that same ID.[^2]

Note that this means the GUID itself is _not_:

  • [] Meaningful or capable of being interpreted in any way

qualifiedName

Atlan uses qualifiedNames to uniquely identify assets based on their characteristics. They look something like this:

default/snowflake/1234567890/DB/SCHEMA

Qualified names are _not_:

  • [] Globally unique (across all systems).

Instead, they're:

  • Consistently constructed in a meaningful way, making it possible for them to be reconstructed.

Note that this means the qualifiedName is:

  • Meaningful and capable of being interpreted
How these impact updates

Since they're truly unique, operations that include a GUID only update an asset, not create one. Conversely, operations that take a qualifiedName can:

  • Create an asset, if no exactly-matching qualifiedName is found in Atlan.
  • Update an asset, if an exact-match for the qualifiedName is found in Atlan.

These operations also require a typeName, so that if creation does occur the correct type of asset is created.

Unintended consequences of this behavior

Be careful when using operations with only the qualifiedName. You may end up creating assets when you were only expecting them to be updated or to fail if they didn't already exist. This is particularly true when you don't give the exact, case-sensitive qualifiedName of an asset. a/b/c/d is not the same as a/B/c/d when it comes to qualifiedNames.

Perhaps this leaves you wondering: why have a qualifiedName at all?

The qualifiedName's purpose is to identify what's a unique asset. Many different tools might all have information about that asset. Having a common "identity" means that many different systems can each independently construct its identifier the same way.

  • If a crawler gets table details from Snowflake it can upsert based on those identity characteristics in Atlan. The crawler won't create duplicate tables every time it runs. This gives idempotency.
  • Looker knows the same identity characteristics for the Snowflake tables and columns. So if you get details from Looker about the tables it uses for reporting, you can link them together in lineage. (Looker can construct the same identifier for the table as Snowflake itself.)

These characteristics aren't possible using GUIDs alone.

Retrieve metadata

With a client connected and a mental model in place, you're ready to read data from Atlan. There are two patterns: fetch directly by identifier when you know it, or search by criteria when you don't.

Retrieve asset by identifier

Use get_by_guid() or get_by_qualified_name() to fetch a single known asset. Both methods take the asset type and the identifier and return the full asset object.

Retrieve an asset (AtlanLiveTest.java)
try (AtlanClient client = new AtlanClient())
  1. You can retrieve an asset using the static get() method on any asset type, providing the client and either the asset's GUID or qualifiedName. (Each asset type is its own unique class in the SDK.)

Search for assets by criteria

Use FluentSearch when you don't know an asset's identifier, or when you want to retrieve many assets that share a common set of characteristics. Build a query, convert it to a request, and iterate the results—the SDK handles pagination automatically.

Search for an asset (AtlanLiveTest.java)
try (AtlanClient client = new AtlanClient())
  1. You can search all active assets of a given type using the select() static method.
  2. Chain onto this method any conditions you want to apply to the search, in this example a where clause that matches any table whose name equals MY_TABLE.
  3. You can then stream the results from this search and process them as any standard Java stream: filter them, limit them, or apply an action to each one. The results of the search are automatically paged and each page is lazily-fetched.

To request specific properties alongside each result, add include_on_results() to your query. This also improves performance—you retrieve only what you need instead of fetching the full asset separately:

Search for an asset (AtlanLiveTest.java)
try (AtlanClient client = new AtlanClient())
  1. Only this line differs from the original query. You can chain as many includeOnResults calls as you want to specify the properties and relationships you want to retrieve for matching assets.

Update metadata

Most operations that write to Atlan are upserts—they create the asset if it doesn't exist, or update it if it does. This section covers two patterns: updating a single asset, and making bulk changes across many assets at once.

Update a single asset

Use the updater() method to build a minimal change set. Provide the asset's identifier and only the properties you want to change—Atlan merges these into the existing asset and leaves everything else untouched.

Update an asset (AtlanLiveTest.java)
try (AtlanClient client = new AtlanClient())
  1. You can update an asset without first looking the asset up, if you know (can construct) its identifying qualifiedName. Using the updater() static method on any asset type, you pass in (typically) the qualifiedName and name of the asset. This returns a builder onto which you can then chain any updates.
  2. You can then chain onto the returned builder as many updates as you want. In this example, this sets the certificate status to VERIFIED.
  3. At the end of your chain of updates, you need to build the builder (into an object, in-memory).
  4. And then, finally, you need to .save() that object to persist those changes in Atlan (passing the client for the tenant you want to save it in). The response contains details of the change: whether the asset was created, updated, or nothing happened because the asset already had those changes.

Update many assets at once

To update multiple assets efficiently, combine FluentSearch and Batch. Use search to find the assets you want to change, call trim_to_required() on each result to strip it down to its identifier, apply your changes, and pass it to the batch. The batch handles grouping and sending updates to Atlan automatically.

Bulk changes (AtlanLiveTest.java)
try (AtlanClient client = new AtlanClient()));
batch.flush(); // (8)
List<Asset> created = batch.getCreated(); // (9)
List<Asset> updated = batch.getUpdated();
}
  1. Start by initializing a batch. Through this batch, you can automatically queue up and bulk-upsert assets—in this example, 20 at a time.
  2. Then use the search pattern discussed earlier to find all the assets you want to update.
  3. Be sure to include any details you might need to make a decision about whether to update the asset or not (and what to update it with).
  4. It's a good idea to set the page size for search results to match the asset batch size, for maximal efficiency.
  5. When you stream the results of the search, you can send an optional boolean parameter. If set to true, the SDK streams the pages of results in parallel (across multiple threads), improving throughput.
  6. When you then operate on each search result, you can add() any updates directly into the batch you created earlier. The batch itself handles saving these to Atlan when a sufficient number have been queued up (20, in this example).
  7. To make an update to a search result, first call trimToRequired() on the result. This pares down the asset to its minimal required attributes and return a builder. You can then chain as many updates onto this builder as you want, keeping to the same pattern—ensuring you are sending only changes.
  8. You must flush() the batch outside of any loop where you've added assets into it. This ensures any final remaining elements in the batch are still sent to Atlan, even if the batch isn't "full."
  9. Finally, from the batch you can retrieve the minimal details about any assets it created or updated.

What's next

Now that you know the core patterns: connect, retrieve, search, update, you can explore further using search (upper-right) or the top-level menu.

Was this page helpful?