Set up on-premises ThoughtSpot access
You will need access to a machine that can run Docker on-premises. You will also need your ThoughtSpot instance details, including credentials.
In some cases you will not be able to expose your ThoughtSpot instance for Atlan to crawl and ingest metadata. For example, this may happen when security requirements restrict access to sensitive, mission-critical data.
In such cases you may want to decouple the extraction of metadata from its ingestion in Atlan. This approach gives you full control over your resources and metadata transfer to Atlan.
Prerequisites
To extract metadata from your on-premises ThoughtSpot instance, you will need to use Atlan's thoughtspot-extractor tool.
Atlan uses exactly the same thoughtspot-extractor behind the scenes when it connects to ThoughtSpot in the cloud.
Install Docker Compose
Docker Compose is a tool for defining and running applications composed of many Docker containers. (Any guesses where the name came from? 😉)
To install Docker Compose:
Instructions provided in this documentation should be enough even if you are completely new to Docker and Docker Compose. However, you can also walk through the Get started with Docker Compose tutorial if you want to learn Docker Compose basics first.
Get the thoughtspot-extractor tool
To get the thoughtspot-extractor tool:
-
Raise a support ticket to get the link to the latest version.
-
Download the image using the link provided by support.
-
Load the image to the server you'll use to crawl ThoughtSpot:
sudo docker load -i /path/to/thoughtspot-extractor-master.tar
Get the compose file
Atlan provides you with a Docker compose file for the thoughtspot-extractor tool.
To get the compose file:
- Download the latest compose file.
- Save the file to an empty directory on the server you'll use to access your on-premises ThoughtSpot instance.
- The file is
docker-compose.yaml.
Define ThoughtSpot connections
The structure of the compose file includes three main sections:
x-templatescontains configuration fragments. You should ignore this section - do not make any changes to it.servicesis where you will define your ThoughtSpot connections.volumescontains mount information. You should ignore this section as well - do not make any changes to it.
Define services
For each on-premises ThoughtSpot instance, define an entry under services in the compose file.
Each entry will have the following structure:
services:
connection-name:
<<: *extract
environment:
<<: *thoughtspot-defaults
EXCLUDE_TAGS_REGEX: "Test1.*|Test2.*"
WITHOUT_TAGS: "true"
volumes:
- ./output/connection-name/filter:/output/filter
- Replace
connection-namewith the name of your connection. <<: *extracttells the thoughtspot-extractor tool to run.environmentcontains all parameters for the tool.EXCLUDE_TAGS_REGEX- specify a regular expression to exclude ThoughtSpot assets based on ThoughtSpot tags.WITHOUT_TAGS- specify a Boolean configuration to determine whether to crawl ThoughtSpot assets without any ThoughtSpot tags.
volumesspecifies where to store results. In this example, the extractor will store results in the./output/connection-name/filterfolder on the local file system.
You can add as many ThoughtSpot connections as you want.
Docker's documentation describes the services format in more detail.
Provide credentials
To define the credentials for your ThoughtSpot connections, you will need to provide a ThoughtSpot configuration file.
The ThoughtSpot configuration is a .ini file with the following format:
[ThoughtSpotConfig]
host=atlan.thoughtspot.cloud
port=443
auth_type=basic_auth; This will use BasicAuth;
auth_type=trusted_auth; This will use TruestedAuth;
auth_type=oauth_access_token; This will use OAuth;
[BasicAuth]
username={{username}}
password={{password}}
[TrustedAuth]
username={{username}}
secret_key={{secret_key}}
[OAuth]
token={{oauth_access_token}}
[ExtractionConfig]
offset=1
limit=10
Secure credentials
Using local files
If you decide to keep ThoughtSpot credentials in plaintext files, we recommend you restrict access to the directory and the compose file. For extra security, we recommend you use Docker secrets to store the sensitive passwords.
To specify the local files in your compose file:
secrets:
thoughtspot_config:
file: ./thoughtspot.ini
This secrets section is at the same top-level as the services section described earlier. It is not a sub-section of the services section.
Using Docker secrets
To create and use Docker secrets:
-
Store the ThoughtSpot configuration file:
sudo docker secret create thoughtspot_config path/to/thoughtspot.ini -
At the top of your compose file, add a secrets element to access your secret:
secrets:
thoughtspot_config:
external: true
name: thoughtspot_config- The
nameshould be the same one you used in thedocker secret createcommand above. - Once stored as a Docker secret, you can remove the local ThoughtSpot configuration file.
- The
-
Within the
servicesection of the compose file, add a new secrets element and specify the name of the secret within your service to use it.
Example
Let's explain in detail with an example:
secrets:
thoughtspot_config:
external: true
name: thoughtspot_config
x-templates:
# ...
services:
thoughtspot-example:
<<: *extract
environment:
<<: *thoughtspot-defaults
EXCLUDE_TAGS_REGEX: "Test1.*|Test2.*"
WITHOUT_TAGS: "true"
volumes:
- ./output/connection-name/filter:/output/filter
- In this example, we've defined the secrets at the top of the file (you could also define them at the bottom). The
thoughtspot_configrefers to an external Docker secret created using thedocker secret createcommand. - The name of this service is
thoughtspot-example. You can use any meaningful name you want. - The
<<: *thoughtspot-defaultssets the connection type to ThoughtSpot. - The
./output/thoughtspot_example/filter:/output/filterline tells the extractor where to store results. In this example, the extractor will store results in the./output/thoughtspot_example/filterdirectory on the local file system. We recommend you output the extracted metadata for different connections in separate directories. - The
secretssection withinservicestells the extractor which secrets to use for this service. Each of these refers to the name of a secret listed at the beginning of the compose file.