Set up a private network link to Hive
AWS PrivateLink creates a secure, private connection between services running in AWS. This document describes the steps to set this up between Hive and Atlan.
You need your AWS administrator involved—you may not have access to run these tasks yourself.
Prerequisites
Verify you have the following:
- Hive instance running in AWS (private EMR instance).
- Atlan hosted in the same region as the Hive instance.
You also need Atlan's AWS account ID later in this process. If you don't already have this, request it now from support.
Set up network to EMR instance
To set up the private network of your Hive EMR instance, from within AWS:
Copy network settings
To copy the network settings of your Hive EMR instance:
- From the left menu, under EMR on EC2, click Clusters.
- In the Clusters table, click on your Hive EMR cluster.
- From the cluster's Network and security tab, under Network, for Virtual Private Cloud (VPC), click on your VPC to view more details.
- Under your VPC's Details tab, copy and save the value under the IPv4 CIDR column.
Create inbound rule
To create an inbound rule allowing your VPC access to your Hive EMR instance:
- From the left menu, under EMR on EC2, click Clusters.
- In the Clusters table, click on your Hive EMR cluster.
- From the cluster's Network and security tab, click the downward arrow for EC2 security groups (firewall) to expand this section.
- Under EC2 security groups (firewall), click on a security group for the cluster.
- Under the Inbound rules tab, click the Edit inbound rules button.
- At the bottom left of the Inbound rules table, click the Add rule button.
- For Type, select All traffic.
- For Port, enter the port on which Hive is accessible.
- For Source, choose Custom and enter the CIDR range for your Hive instance (see Copy network settings).
- Below the bottom right of the Inbound rules table, click the Save rules button. Repeat steps 4 to 7 for each security group in the cluster.
Create internal Network Load Balancer
Start creating NLB
To create an NLB, from within AWS:
- Navigate to Services, then Compute, then EC2.
- On the left, under Load Balancing, click on Load Balancers.
- At the top of the screen, click the Create Load Balancer button.
- Under the Network Load Balancer option, click the Create button.
- Enter the following Basic configuration settings for the load balancer:
- For Load balancer name, enter a unique name.
- For Scheme, select Internal.
- For IP address type, select IPv4.
- Enter the following Network mapping settings for the load balancer:
- For VPC, select the VPC where the Hive instance is located (see Copy network settings).
- For Mappings, select the availability zones with private subnets.
- Enter the following Listeners and routing settings for the load balancer:
- For Port, enter the port value used in Created inbound rule.
- For Default action, click the Create target group link. This opens the target group creation in a new browser tab.
Create target group
To create a target group for the NLB:
- Enter the following Basic configuration settings for the target group:
- For Choose target type, select Instances.
- For Target group name, enter a name.
- For Port, enter the port value used in Create inbound rule.
- For VPC, select the VPC where the Hive instance is located (see Copy network settings).
- At the bottom of the form, click the Next button.
- From the Available instances table:
- Click the checkbox next to your Hive instance.
- Enter the port value used in Create inbound rule.
- Click the Include as pending below button.
- At the bottom right of the form, click the Create target group button.
Finish creating NLB
Return to the browser tab where you started the NLB creation, and continue:
- Under Listeners and routing, click the refresh arrow to the far right of the Default action drop-down box.
- Select the target group you created in the Default action drop-down.
- At the bottom right of the form click the Create load balancer button.
- In the resulting screen, click the View load balancer button.
Verify target group is healthy
To verify the target group is healthy:
- From the EC2 menu on the left, under Load Balancing click Target Groups.
- From the Target groups table, click the row for the target group you created.
- At the bottom of the screen, under the Details tab, check that there is a 1 under both Total targets and Healthy.
Create endpoint service
To create an endpoint service, from within AWS:
- Navigate to Services, then Networking & Content Delivery, then VPC.
- From the menu on the left, under Virtual private cloud click Endpoint services.
- At the top of the page, click the Create endpoint service button.
- Enter the following Endpoint service settings:
- For Name, enter a meaningful name.
- For Load balancer type, choose Network.
- For Available load balancers, select the load balancer you created in Create internal Network Load Balancer.
- Enter the following Additional settings:
- For Require acceptance for endpoint, enable Acceptance required.
- For Supported IP address types, enable IPv4.
- At the bottom right of the form, click the Create button.
- Under the Details of the endpoint service, copy the hostname under Service name.
Allow Atlan account access
To grant Atlan's account access to the service, from within the endpoint service screen:
- At the bottom of the screen, change to the Allow principals tab.
- At the top of the Allow principals table, click the Allow principals button.
- Under Principals to add and ARN, enter the Atlan account ID.
- At the bottom right of the form, click the Allow principals button.
Notify Atlan support
Once all the previous steps are complete, provide Atlan support with the following information:
- The hostname for the endpoint service you created.
- The port number for your Hive instance.
There are additional steps Atlan then needs to complete:
- Creating a security group.
- Creating an endpoint.
Once the Atlan team has confirmed the configuration is ready, please continue with the remaining steps.
Accept consumer connection request
To accept the consumer connection request, from within AWS:
- Navigate to Services, then Networking & Content Delivery, then VPC.
- From the menu on the left, under Virtual private cloud click Endpoint services.
- From the Endpoint services table, select the endpoint service you created in Create endpoint service.
- At the bottom of the screen, change to the Endpoint connections tab.
- Verify a row in the Endpoint connections table has a State of Pending.
- Select this row, and click the Actions button and then Accept endpoint connection request.
- If prompted to confirm, type accept into the field and click the Accept button.
- Wait for this to complete. It typically takes about 30 seconds.
😅 The connection is now established. You can now use the service endpoint provided by Atlan support as the hostname to crawl Hive in Atlan! 🎉