Build custom app
This lesson guides you through building a custom Atlan application from scratch, applying all the patterns and tools you learned in the previous chapters. Think of this as moving from touring model homes to building your own custom homeβyou take everything you've learned and create something entirely your own.
What you learn here: How to build a local extractor that processes JSON files and converts the data into Atlanβs standardized format. By the end, you have a working extractor application.
Before you beginβ
Before you start, make sure you have:
- Completed lesson 1: Set up your development environment
- Completed lesson 2: Run your first sample app
Core concepts you applyβ
Before you start building, understand what your application demonstrates. Think of these concepts as the key systems in the house you're about to construct:
Process Files
Your application reads JSON files containing Table metadata and validates the content structure before processing.
Transform Data
Convert raw metadata into Atlan's standardized format with owner lists, certificates, and hierarchical relationships.
Manage Resources
Implement proper separation of concerns with handlers managing SDK interactions and clients handling file operations.
Web Interface
Use the template's web interface to test your extractor and monitor processing status.
Now that you understand what you're building, time to create your project structure and see these concepts work together in a real application.
Create your project workspaceβ
Time to build your own application! You are going to create a local file extractor using the template from the sample apps repository you worked with in Run your first sample app lesson. This gives you a proven foundation to build upon.
-
Create your project directory: Create a dedicated directory to organize your extractor application code. This directory separates your extractor code from other projects and becomes your workspace:
mkdir atlan-local-file-extractor-app
cd atlan-local-file-extractor-app -
Copy the template structure: The generic template contains the required files and structure for an Atlan application. It sets up the correct organization, configurations, and placeholder files for your extractor:
cp -r ../atlan-sample-apps/templates/generic/ .
-
Create your Python environment: A virtual environment keeps your project dependencies separate. This prevents package version conflicts between different Python projects and maintains consistent behavior:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
What you just createdβ
You now have a complete application workspace on your computer. This is the same template structure you used in Run your first sample app, but now it's yours to customize for file extraction.
Explore your project structure
π atlan-local-file-extractor-app/
βββ π main.py # Application entry point
βββ π pyproject.toml # Python project configuration
βββ π app/
β βββ π __init__.py # Package initialization
β βββ π activities.py # File processing activities
β βββ π workflow.py # Extraction workflow orchestration
β βββ π client.py # File I/O operations
β βββ π handler.py # SDK interface
β βββ π templates/ # project templates
β βββ π workflow.json # Workflow configuration template
βββ π tests/ # Test files
Each file has a specific purpose in making your extraction application work properly.
With your workspace ready, the next step is customizing the application components to handle file extraction. Just like a contractor following blueprints to build specific rooms, you customize each component for your file processing needs.
β¨ Your workspace configured! You have the SDK installed and project structure ready for customization.
Build your workflow orchestrationβ
Time to customize your workflow for file extraction! Your workflow defines the sequence of operations that transform raw JSON files into standardized metadata. You are going to modify the template's workflow to handle file processing.
π» Open app/workflow.py
and replace all template code with the snippet below. Pay attention to lines 23β28 and 38, thatβs where your custom logic sits.
from datetime import timedelta
from typing import Any, Callable, Dict, Sequence
from app.activities import ActivitiesClass
from application_sdk.activities import ActivitiesInterface
from application_sdk.workflows import WorkflowInterface
from temporalio import workflow
@workflow.defn
class WorkflowClass(WorkflowInterface):
@workflow.run
async def run(self, workflow_config: Dict[str, Any]) -> None:
"""Orchestrate the workflow summary flow."""
activities_instance = ActivitiesClass()
# Merge any provided args (from frontend POST body or server config)
workflow_args: Dict[str, Any] = await workflow.execute_activity_method(
activities_instance.get_workflow_args,
workflow_config,
start_to_close_timeout=timedelta(seconds=10),
)
# Extract and Transform the metadata
extraction_result: Dict[str, Any] = await workflow.execute_activity(
activities_instance.extract_and_transform_metadata,
workflow_args,
start_to_close_timeout=timedelta(seconds=30),
)
@staticmethod
def get_activities(activities: ActivitiesInterface) -> Sequence[Callable[..., Any]]:
"""
Declare which activity methods are part of this workflow for the worker.
"""
if not isinstance(activities, ActivitiesClass):
raise TypeError("Activities must be an instance of ActivitiesClass")
return [activities.get_workflow_args, activities.extract_and_transform_metadata]
What you just builtβ
The workflow itself doesnβt process files. Instead, it ensures that every step runs in the right order, with Temporal managing error handling and retries. What was once just a template is now the conductor that orchestrates your extractor.
- Temporal integration: Uses the
@workflow.defn
decorator andWorkflowInterface
inheritance to connect with Temporal. - Run method: Defines a
run
method that accepts configuration and drives the workflow. - Activity orchestration: Instead of a simple βhello worldβ activity, youβre chaining two real steps:
get_workflow_args
: gathers configuration.extract_and_transform_metadata
: processes files and transforms data.
- Timeouts: Adds a 30-second timeout to file processing since it takes longer than trivial operations.
- Activity registration: Updates
get_activities
to register your custom extraction activities instead of placeholders.
β¨ Orchestration layer complete! Your workflow can now sequence activities with proper timeout management.
Implement your extraction activitiesβ
Your workflow can now orchestrate activities, but what activities does it run? Time to create the extraction logic! Activities contain the business logic for reading files and transforming data. Your workflow calls these activities in sequence to complete the extraction process.
π» Open app/activities.py
and replace all template code with the snippet below. Pay attention to lines 17β74, thatβs where your custom logic sits.
import json
import os
from typing import Any, Dict
from application_sdk.activities import ActivitiesInterface
from application_sdk.observability.logger_adaptor import get_logger
from temporalio import activity
from .handler import HandlerClass
logger = get_logger(__name__)
activity.logger = logger
class ActivitiesClass(ActivitiesInterface):
"""Activities for the Extractor app using the handler/client pattern."""
def __init__(self, handler: HandlerClass | None = None):
self.handler = handler or HandlerClass()
@activity.defn
async def extract_and_transform_metadata(
self, config: Dict[str, Any]
) -> Dict[str, Any]:
"""Extract and transform Table metadata from JSON file."""
input_file = output_file = ""
try:
if not self.handler or not self.handler.client:
raise ValueError("Handler or extractor client not initialized")
output_file = config.get("output_file", "")
input_file = config.get("input_file")
if not os.path.exists(input_file):
raise FileNotFoundError(f"File not found: {input_file}")
raw_data = json.load(self.handler.client.create_read_handler(input_file))
transformed_data = []
for item in raw_data:
if item.get("Type") == "Table":
# Process owner users - split by newline if present
owner_users_str = item.get("Owner_Users", "")
owner_users = owner_users_str.split("\n") if owner_users_str else []
# Process owner groups - split by newline if present
owner_groups_str = item.get("Owner_Groups", "")
owner_groups = owner_groups_str.split("\n") if owner_groups_str else []
transformed_data.append(
{
"typeName": "Table",
"name": item.get("Name", ""),
"displayName": item.get("Display_Name", ""),
"description": item.get("Description", ""),
"userDescription": item.get("User_Description", ""),
"ownerUsers": owner_users,
"ownerGroups": owner_groups,
"certificateStatus": item.get("Certificate_Status", ""),
"schemaName": item.get("Schema_Name", ""),
"databaseName": item.get("Database_Name", ""),
}
)
with open(output_file, "w", encoding="utf-8") as file:
for item in transformed_data:
json.dump(item, file, ensure_ascii=False)
file.write("\n")
return {"status": "success", "records_processed": len(transformed_data)}
except Exception as e:
logger.error(
f"Failed to extract and transform table metadata: {e}", exc_info=True
)
raise
finally:
if self.handler and self.handler.client:
self.handler.client.close_file_handler()
What you just builtβ
This activity code performs the heavy lifting: it reads raw metadata and transforms it into a format Atlan understands, and this is where the real work of your extractor happens:
- Temporal integration: Uses the
@activity.defn
decorator to define activities that can be called from the workflow. - Separation of concerns: Follows a handler/client pattern for cleaner structure, with built-in logging and error handling.
- Configuration handling: Keeps the
get_workflow_args
activity to merge parameters, adapted from the template. - File validation: Adds existence checks before processing, so failures are clear and fast.
- JSON processing: Uses the clientβs file handler (coming next) to safely read JSON data, replacing placeholder logic.
- Owner list transformation: Splits newline-separated owners (
"jsmith\njdoe"
) into arrays (["jsmith", "jdoe"]
). - Data transformation loop: Filters for Table assets and maps fields from your source format to Atlanβs standard format.
- Output generation: Writes transformed data as newline-delimited JSON (NDJSON), optimized for streaming large datasets.
- Resource cleanup: Ensures files close properly, even when errors occur.
β¨ Data transformation ready! Your extractor can parse JSON and convert owner lists from newline-separated strings to arrays.
Create your client for file operationsβ
Your activity knows WHAT to do, but it needs help with HOW to safely read files. Time to build the client that handles file operations! The client manages file I/O operations with proper resource handling. Your activity will use this client to read JSON files safely and efficiently.
π» Open app/client.py
and replace all template code with the snippet below. Pay attention to lines 14-34, thatβs where your custom logic sits.
import os
from typing import Optional, TextIO
from application_sdk.observability.logger_adaptor import get_logger
logger = get_logger(__name__)
class ClientClass:
"""Client for handling JSON file operations and resource management."""
def __init__(self):
# Template: Initialize file handler attribute
self.file_handler: Optional[TextIO] = None
# ========== β¨ CUSTOM METHODS ADDED ==========
def create_read_handler(self, file_path: str) -> TextIO:
"""Create and return a file handler for reading JSON files."""
if not os.path.exists(file_path):
raise FileNotFoundError(f"File not found: {file_path}")
if self.file_handler:
self.close_file_handler()
self.file_handler = open(file_path, 'r', encoding='utf-8')
logger.info(f"Created file handler for: {file_path}")
return self.file_handler
def close_file_handler(self) -> None:
"""Close the current file handler and clean up resources."""
if self.file_handler:
self.file_handler.close()
self.file_handler = None
logger.info("File handler closed successfully")
# ========== β¨ END CUSTOM METHODS ==========
What you just builtβ
The client abstracts file operations away from your activities. If you later switch from local files to cloud storage or a database, you only update this client β your activities stay unchanged. The separation keeps your extractor clean, maintainable, and testable:
- Class structure: Defines a
ClientClass
with logging and clear initialization, ready for extension. - File opening with safety: The
create_read_handler
method adds critical protections:- Validates file existence before opening.
- Cleans up previously opened files to avoid leaks.
- Uses UTF-8 encoding to handle international characters.
- Resource cleanup: The
close_file_handler
method ensures files are always closed and the handler is reset, preventing exhaustion of file handles.
β¨ Resource management implemented! Files are now handled safely with automatic cleanup, even during errors.
Create your handler interfaceβ
Now you have activities that know what to do and a client that knows how to do it safely. But how does the outside world communicate with your extractor? The handler is the bridge! It provides the SDK interface between the web interface and your business logic, coordinating between the frontend, SDK, and your client.
π» Review app/handler.py
to see how the handler bridges your components:
What the handler providesβ
The handler acts as your application's SDK interface, and the template version includes everything your extractor needs:
- Initializes with your
ClientClass
to make it available to activities - Manages credentials passed from the SDK
- Performs connectivity and preflight checks before processing
- Inherits from
HandlerInterface
for SDK compatibility
The template handler works as-is because your file extraction happens in activities, not in the handler. The handler just needs to bridge the SDK with your client, which the template already does perfectly.
β¨ SDK bridge connected! Your handler links the web interface to your extraction logic. Next, let's build the web interface to complete your extractor.
Build your web interfaceβ
Your extractor has all the backend logic, but how do users interact with it? Time for the frontend! The web interface allows users to specify file paths and monitor processing. It demonstrates how configuration flows from the UI through your handler, to the workflow, and finally to your activities.
π» Create frontend/config.json with the configuration manifest:
{
"id": "Extractor",
"name": "Extractor",
"logo": "https://assets.atlan.com/assets/atlan-bot.svg",
"config": {
"properties": {
"input_file": {
"type": "string",
"required": true,
"ui": {
"label": "Input JSON file path",
"placeholder": "for example, extractor-app-input-table.json",
"grid": 8
}
},
"output_file": {
"type": "string",
"required": true,
"ui": {
"label": "Output file path",
"placeholder": "for example, extractor-app-input-table_transformed.json",
"grid": 8
}
}
},
"steps": [
{
"id": "payload",
"properties": [
"input_file",
"output_file"
]
}
]
}
}
What you just builtβ
The configuration manifest defines how your extractor's UI gets generated. The template's frontend components automatically transform this JSON into a working interface:
- Properties definition: Each property (
input_file
,output_file
) becomes a form field with type validation, required flags, and UI hints. - UI configuration: The
ui
object controls the visual presentation - labels for clarity, placeholders for guidance, and grid sizing for layout. - Steps array: Groups properties into logical sections. Here you have one step called "payload" that contains both file paths.
- JSON-Schema validation: The frontend uses JSON-Schema to validate user input before submission, preventing invalid configurations from reaching your workflow.
- Automatic form generation: The template's form engine reads this manifest and generates the HTML form dynamically - no manual HTML editing needed.
- Configuration flow: When users submit, the form data gets normalized into the exact structure your workflow expects in
workflow_config
.
The template's existing frontend components in frontend/static/
and frontend/templates/
automatically use this configuration to generate the user interface. When users submit the form, it creates the normalized configuration payload that gets passed to your workflow.
π― All Components Ready! Your extractor now has a complete user interface. You've built every component your application needs. Let's fire it up and watch it run!
Test your extraction applicationβ
The moment of truth! You've built a complete extraction pipeline: workflow orchestrates, activities transform, client handles files safely, and handler connects everything to the web interface. Time to see your file extractor in action with real data.
-
Create test data: Create a file called
extractor-app-input-table.json
in your project directory:[
{
"Type": "Table",
"Name": "CUSTOMER",
"Display_Name": "Customer",
"Description": "Staging table for invoice data.",
"User_Description": "",
"Owner_Users": "jsmith\njdoe",
"Owner_Groups": "",
"Certificate_Status": "VERIFIED",
"Schema_Name": "PEOPLE",
"Database_Name": "TEST_DB"
},
{
"Type": "Table",
"Name": "CUSTOMER_2",
"Display_Name": "Customer 2",
"Description": "Production table for customer data.",
"User_Description": "",
"Owner_Users": "jsmith\njdoe",
"Owner_Groups": "",
"Certificate_Status": "VERIFIED",
"Schema_Name": "PEOPLE",
"Database_Name": "TEST_DB"
}
] -
Install all required packages: This command downloads and installs all Python libraries and tools your application needs to run locally:
uv sync --all-extras --all-groups
-
Set up precommit hooks: Configure automatic code quality checks that run before you save changes to maintain clean, consistent code:
uv run pre-commit install
-
Download required components: Get Temporal server, database, and other services your application needs to run workflows:
uv run poe download-components
-
Start the supporting services: Run the command to start the supporting services like Temporal, database, and other services your application needs to run workflows:
uv run poe start-deps
-
Start your extractor application: Run the command to start the application.
uv run main.py
-
Access the application: Open your web browser and go to
http://localhost:8000
to access the interface. -
Provide input: When the interface loads, you need to provide the input and output file. Submit the form to start the extraction process.
- Input file: Enter
extractor-app-input-table.json
(the file you just created) - Output file: Enter
extractor-app-input-table_transformed.json
(where you want to save the transformed data)
- Input file: Enter
-
Confirm submission: After submission, go to
http://localhost:8233
to open the workflow dashboard. Verify that the workflow has completed. -
Check your results: After processing completes, look for the generated
extractor-app-input-table_transformed.json
file in your project directory. The file contains the transformed Table records:{"typeName": "Table", "name": "CUSTOMER", "displayName": "Customer", "description": "Staging table for invoice data.", "userDescription": "", "ownerUsers": ["jsmith", "jdoe"], "ownerGroups": [], "certificateStatus": "VERIFIED", "schemaName": "PEOPLE", "databaseName": "TEST_DB"}
{"typeName": "Table", "name": "CUSTOMER_2", "displayName": "Customer 2", "description": "Production table for customer data.", "userDescription": "", "ownerUsers": ["jsmith", "jdoe"], "ownerGroups": [], "certificateStatus": "VERIFIED", "schemaName": "PEOPLE", "databaseName": "TEST_DB"}
Your extractor successfully transforms raw metadata into Atlan's standardized format, including processing owner lists, certificates, and hierarchical relationships.
You did it! You have completed the tutorial series and built a solid foundation in Atlan application development.
What's nextβ
You successfully built a complete local file extraction application using the Atlan Application SDK. Your application demonstrates professional patterns including Temporal workflow orchestration, proper separation of concerns with the handler/client architecture, and reliable file processing with comprehensive error handling.
The generic template you worked with is a GitHub template, which means you can create your own copy and use it as the foundation for your own custom applications.
Continue learningβ
Deepen your understanding of Atlan application development with these resources:
- Application Architecture: Deep dive into the technical architecture and how Temporal, Dapr, and the SDK work together in your applications
- Sample Applications: Explore more examples to learn different patterns and use cases
- Application Structure: Reference guide for organizing your application code