Skip to main content

Set up query history for mining

Before you can mine query history from Microsoft SQL Server, enable Query Store to capture query history, export the queries in JSON format, and store them in an S3 bucket.

Prerequisites

Before you begin, make sure you have:

  • Administrative access to your Microsoft SQL Server instance or contact with your SQL Server administrator
  • Access to an S3 bucket where you can store query history files
  • Reviewed the order of operations for workflow setup

Enable query store to capture query history

Atlan recommends enabling SQL Server Query Store on your database to capture query history. Query Store is available from SQL Server 2016 and later. If you already capture query history through another mechanism, export the queries and related fields in the JSON format described in the next section.

Enable Query Store:

ALTER DATABASE <DB> SET QUERY_STORE = ON;
ALTER DATABASE <DB>
SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);

Then run a query like the following to export query text and related context:

SELECT
DB_NAME() AS DATABASE_NAME,
OBJECT_SCHEMA_NAME(q.object_id, DB_ID()) AS SCHEMA_NAME,
OBJECT_NAME(q.object_id, DB_ID()) AS PARENT_PROCEDURE_NAME,
qt.query_sql_text AS QUERY_TEXT
FROM sys.query_store_query AS q
JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
WHERE q.is_internal_query = 0
ORDER BY
SCHEMA_NAME,
PARENT_PROCEDURE_NAME,
QUERY_TEXT;

Structure and store query files

Format your exported query history as JSON files and store them in an S3 bucket for the miner to access.

  1. Make sure your query history files:

    • Use a .json extension
    • Are present in a single S3 bucket and prefix (directory)
  2. Format each line as a single JSON value. The JSON object can't be pretty-formatted or span multiple lines.

  3. Place each SQL query on its own line. Don't use commas to separate lines.

  4. Include default database and schema, and session IDs in the JSON:

    • If a SQL query has only the table or view name, Atlan uses the default database and schema to generate lineage.
    • Including the session ID speeds up processing. Make sure all queries belonging to the same session are next to each other in the file.
    • To link lineage to stored procedures, include PARENT_PROCEDURE_NAME in the JSON for queries executed by that procedure. This key is fixed and must use this exact name.

    Here is an example (shown across multiple lines for readability, but in the file it must be a single line):

    {
    "QUERY_TEXT": "select * from Sales.Orders where OrderDate >= '2024-01-01';",
    "DATABASE_NAME": "ContosoDB",
    "SCHEMA_NAME": "Sales",
    "PARENT_PROCEDURE_NAME": "usp_ProcessOrders",
    "SESSION_ID": "5c2f0a41-5d02-46f1-b9bd-ef80ad571013"
    }

    The JSON property names are configurable during miner setup. If a SQL query references only an unqualified table or view name, Atlan uses the default database (DATABASE_NAME) and schema (SCHEMA_NAME) to qualify the asset during lineage generation.

  5. Configure access and permissions for the S3 bucket that hosts your query files:

    • Follow the guidance in Mine queries through S3
    • Use the steps in the "Set up the S3 bucket" section, including the options for using the Atlan S3 bucket or your own S3 bucket, and KMS policy updates if applicable

Next steps