Troubleshooting Hive connectivity
This guide helps you resolve common issues when connecting Atlan to Hive, including authentication failures, Kerberos errors, certificate problems, and network connectivity issues.
Basic authentication issues
Invalid username or password
Problem: Authentication fails with "Invalid username or password" error
Cause: Credentials are incorrect or the user account is locked/disabled
Solution:
- Verify credentials by testing them with a Hive client (beeline):
beeline -u "jdbc:hive2://hostname:10000/default" -n username -p password - Check if the user account is active in your authentication system
- Verify the username format matches your Hive metastore configuration
- Verify no special characters in the password are being escaped incorrectly
User lacks permissions
Problem: Connection succeeds but no metadata is extracted
Cause: User account lacks SELECT permissions on database objects
Solution:
- Grant SELECT permissions on all databases you want to crawl:
GRANT SELECT ON DATABASE database_name TO USER username; - Verify permissions are applied:
SHOW GRANT USER username; - Check HDFS ACLs, LDAP groups, and any policy engines (Ranger, Sentry) that may restrict access
Kerberos authentication issues
KDC unreachable
Problem: Error message Cannot contact any KDC for realm REALM_NAME
Cause: Network can't reach the Kerberos Key Distribution Center
Solution:
- Verify KDC hostname/IP is correct in krb5.conf
- Test network connectivity to KDC:
telnet kdc-hostname 88
nc -zv kdc-hostname 88 - Verify firewall rules permit traffic to port 88 (TCP and UDP)
- For Self-Deployed Runtime, verify the runtime can reach the KDC from within your network
- Check krb5.conf
[realms]section has correct KDC addresses:[realms]
YOUR.REALM = {
kdc = kdc.example.com:88
admin_server = kdc.example.com:749
}
Server not found in Kerberos database
Problem: Error message "Server <service_principal> not found in Kerberos database"
Cause: The service principal name doesn't match what's registered in the KDC
Solution:
- Verify the service principal exists in the KDC:
kadmin.local -q "listprincs hive/*" - Check the exact service principal format with your Hadoop administrator
- Verify the service name in Atlan configuration matches your Hive setup (typically
hive) - Confirm the hostname matches:
- Use FQDN (fully qualified domain name) for the Hive host
- Check DNS resolution:
nslookup hostname
host hostname
Wrong realm derived from hostname
Problem: Error message "Server krbtgt/WRONG.[email protected] not found in Kerberos database"
Cause: DNS hostname canonicalization is deriving the wrong realm from the server hostname
Solution:
- Disable DNS canonicalization in krb5.conf:
[libdefaults]
dns_canonicalize_hostname = false
rdns = false - Add explicit domain-to-realm mappings:
[domain_realm]
.your-domain.com = YOUR.REALM
your-domain.com = YOUR.REALM
.amazonaws.com = YOUR.REALM
amazonaws.com = YOUR.REALM
.compute.internal = YOUR.REALM
compute.internal = YOUR.REALM - Verify the mapping works by testing kinit locally
- Re-upload the updated krb5.conf file in Atlan
This issue commonly occurs when connecting to cloud-hosted Hive clusters (AWS, Azure, GCP) where the FQDN includes cloud provider domains.
Invalid keytab file
Problem: Error message "Key version number for principal in key table is incorrect"
Cause: Keytab file doesn't match the current principal keys in the KDC
Solution:
- Regenerate the keytab file:
kadmin.local -q "ktadd -k /path/to/new.keytab principal@REALM" - Test the new keytab locally before uploading:
kinit -kt /path/to/new.keytab principal@REALM
klist
kdestroy - Upload the new keytab file in Atlan
- If the principal was recently changed or password reset, generate a fresh keytab
Ticket expiration issues
Problem: Connection works initially but fails after several hours
Cause: Kerberos tickets expired and couldn't be renewed
Solution:
- Check ticket lifetime settings in krb5.conf:
[libdefaults]
ticket_lifetime = 24h
renew_lifetime = 7d - Atlan automatically renews tickets, but if workflows run longer than the renewable lifetime, they'll fail
- Consider shorter extraction schedules or longer renewal lifetimes
- Verify the keytab file is valid and can generate new tickets
SASL handshake failed
Problem: Error message "TSocket read 0 bytes" or "SASL handshake failed"
Cause: Multiple possible causes related to Kerberos configuration or network
Solution:
- Verify the service principal format is correct:
- Expected:
hive/hostname@REALM - Service name (typically
hive) must match HiveServer2 configuration
- Expected:
- Check that HiveServer2 is configured for Kerberos:
<!-- In hive-site.xml -->
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property> - Verify network connectivity to HiveServer2 port (default 10000):
telnet hive-hostname 10000 - Check HiveServer2 logs for details on why it rejected the connection
- Verify the client can resolve the HiveServer2 hostname properly:
- For Self-Deployed Runtime, add hostAliases (Kubernetes) or extra_hosts (Docker) if DNS resolution fails
TLS/MTLS certificate issues
CA certificate validation failed
Problem: Error message "SSL certificate verification failed" or "Unable to verify server certificate"
Cause: CA certificate doesn't match the certificate presented by HiveServer2
Solution:
- Obtain the correct CA certificate that signed your HiveServer2's SSL certificate
- Verify the certificate chain from your server:
openssl s_client -connect hostname:10000 -showcerts - Verify the CA certificate file is valid:
openssl x509 -in ca-cert.pem -text -noout - Check certificate expiration date
- Upload the correct CA certificate in Atlan
Client certificate rejected
Problem: Error message "Client certificate rejected" or "SSL handshake failed"
Cause: HiveServer2 doesn't trust the client certificate or the certificate is invalid
Solution:
- Verify HiveServer2 is configured to accept your client certificate
- Check that the client certificate is signed by a CA trusted by HiveServer2
- Verify certificate and key pair match:
# Extract public key from certificate
openssl x509 -in client-cert.pem -pubkey -noout > cert-pubkey.pem
# Extract public key from private key
openssl pkey -in client-key.pem -pubout > key-pubkey.pem
# Compare (should be identical)
diff cert-pubkey.pem key-pubkey.pem - Check certificate expiration:
openssl x509 -in client-cert.pem -noout -dates - Verify certificate includes required extensions (Extended Key Usage: Client Authentication)
Client key passphrase incorrect
Problem: Error message Could not decrypt key or Invalid passphrase
Cause: The passphrase for the encrypted client key is incorrect or the key format is wrong
Solution:
- Verify the passphrase is correct by testing locally:
openssl rsa -in client-key.pem -check - If the key is encrypted and you don't need encryption, decrypt it:
openssl rsa -in client-key-encrypted.pem -out client-key-plain.pem - Re-upload the key file (encrypted or plain) and provide the correct passphrase if encrypted
- Verify no extra whitespace in the passphrase field
Certificate format issues
Problem: Error message Could not load certificate or Invalid certificate format
Cause: Certificate or key file is in the wrong format or corrupted
Solution:
- Convert certificates to PEM format if needed:
# From DER to PEM
openssl x509 -inform der -in certificate.der -out certificate.pem
# From P12 to PEM
openssl pkcs12 -in certificate.p12 -out certificate.pem -nodes - Verify file format:
# Should show "BEGIN CERTIFICATE" for certs
head certificate.pem
# Should show "BEGIN PRIVATE KEY" or "BEGIN RSA PRIVATE KEY" for keys
head client-key.pem - Verify files aren't corrupted during upload (try uploading as .zip if direct upload fails)
Network connectivity issues
Connection timeout
Problem: Error message "Connection timeout" or "Failed to connect to hostname:port"
Cause: Network can't reach HiveServer2 or firewall blocks the connection
Solution:
- Test network connectivity:
telnet hostname 10000
nc -zv hostname 10000 - Verify HiveServer2 is running:
# Check HiveServer2 process
jps | grep HiveServer2 - Check that firewall rules permit traffic to port 10000
- For Direct connectivity, verify Hive server accepts connections from Atlan's IP addresses
- For Self-Deployed Runtime, verify the runtime can reach Hive from within your network
Port already in use
Problem: HiveServer2 won't start with "Address already in use" error
Cause: Another process is using port 10000
Solution:
- Identify the process using the port:
lsof -i :10000
netstat -tuln | grep 10000 - Stop the conflicting process or configure HiveServer2 to use a different port
- Update the port number in Atlan configuration if using a non-standard port
DNS resolution issues
Problem: Error message Could not resolve hostname
Cause: DNS can't resolve the HiveServer2 hostname
Solution:
- Verify DNS resolution:
nslookup hostname
host hostname
dig hostname - Use IP address instead of hostname if DNS is unreliable
- For Self-Deployed Runtime with problematic DNS:
- Kubernetes: Add hostAliases to pod spec:
hostAliases:
- ip: "10.0.1.100"
hostnames:
- "hadoop-master.company.com" - Docker Compose: Add extra_hosts:
extra_hosts:
- "hadoop-master.company.com:10.0.1.100"
- Kubernetes: Add hostAliases to pod spec:
- Verify
/etc/hostsentries if using host file resolution
Metastore limitations
Unable to connect to metastore
Problem: HiveServer2 can't connect to the Hive metastore database
Cause: Metastore database is down or connection is misconfigured
Solution:
- Verify metastore database (MySQL, PostgreSQL) is running
- Check HiveServer2 metastore configuration in hive-site.xml:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://metastore-host:3306/hive</value>
</property> - Test metastore database connectivity from HiveServer2 host
- Check metastore database credentials are correct
- Review HiveServer2 logs for specific metastore errors
Metastore schema version mismatch
Problem: Error message "Metastore schema version doesn't match"
Cause: Hive metastore schema version incompatible with HiveServer2 version
Solution:
- Check current schema version:
SELECT * FROM VERSION; - Run schema upgrade tool:
schematool -dbType mysql -upgradeSchema - Backup metastore database before upgrading
- Consult Hadoop administrator if unsure about schema compatibility
Performance issues
Slow metadata extraction
Problem: Metadata extraction takes hours to complete
Cause: Large number of databases/tables or inefficient queries
Solution:
- Use include/exclude filters to limit scope of extraction
- Exclude system schemas and temporary tables
- Schedule extractions during low-usage periods
- For huge metastores (millions of tables), consider offline extraction method
- Check HiveServer2 and metastore database performance
Out of memory errors
Problem: HiveServer2 or Atlan workflow fails with out-of-memory error
Cause: Extracting too much metadata or insufficient memory allocation
Solution:
- Increase HiveServer2 heap size if the server is running out of memory
- Use include filters to reduce the scope of extraction
- Break extraction into multiple workflows for different database groups
- For Self-Deployed Runtime, increase memory allocation:
resources:
limits:
memory: 16Gi
requests:
memory: 16Gi
See also
- Set up Hive: Configure authentication and permissions
- Crawl Hive: Run metadata extraction workflows
- Preflight checks for Hive: Verify prerequisites