Arrange federated entry to Amazon Athena for Microsoft AD FS customers utilizing AWS Lake Formation and a JDBC consumer


    Tens of 1000’s of AWS prospects select Amazon Easy Storage Service (Amazon S3) as their knowledge lake to run massive knowledge analytics, interactive queries, high-performance computing, and synthetic intelligence (AI) and machine studying (ML) purposes to realize enterprise insights from their knowledge. On prime of those knowledge lakes, you need to use AWS Lake Formation to ingest, clear, catalog, remodel, and assist safe your knowledge and make it accessible for evaluation and ML. After you have setup your knowledge lake, you need to use Amazon Athena which is an interactive question service that makes it simple to research knowledge in Amazon Easy Storage Service (Amazon S3) utilizing customary SQL.

    With Lake Formation, you may configure and handle fine-grained entry management to new or present databases, tables, and columns outlined within the AWS Glue Knowledge Catalog for knowledge saved in Amazon S3. After you set entry permissions utilizing Lake Formation, you need to use analytics providers equivalent to Amazon Athena, Amazon Redshift, and Amazon EMR without having to configure insurance policies for every service.

    A lot of our prospects use Microsoft Lively Listing Federation Providers (AD FS) as their id supplier (IdP) whereas utilizing cloud-based providers. On this put up, we offer a step-by-step walkthrough of configuring AD FS because the IdP for SAML-based authentication with Athena to question knowledge saved in Amazon S3, with entry permissions outlined utilizing Lake Formation. This permits end-users to log in to their SQL consumer utilizing Lively Listing credentials and entry knowledge with fine-grained entry permissions.

    Resolution overview

    To construct the answer, we begin by establishing belief between AD FS and your AWS account. With this belief in place, AD customers can federate into AWS utilizing their AD credentials and assume permissions of an AWS Id and Entry Administration (IAM) position to entry AWS assets such because the Athena API.

    To create this belief, you add AD FS as a SAML supplier into your AWS account and create an IAM position that federated customers can assume. On the AD FS aspect, you add AWS as a relying social gathering and write SAML declare guidelines to ship the precise person attributes to AWS (particularly Lake Formation) for authorization functions.

    The steps on this put up are structured into the next sections:

    1. Arrange an IAM SAML supplier and position.
    2. Configure AD FS.
    3. Create Lively Listing customers and teams.
    4. Create a database and tables within the knowledge lake.
    5. Arrange the Lake Formation permission mannequin.
    6. Arrange a SQL consumer with JDBC connection.
    7. Confirm entry permissions.

    The next diagram gives an summary of the answer structure.

    The stream for the federated authentication course of is as follows:

    1. The SQL consumer which has been configured with Lively Listing credentials sends an authentication request to AD FS.
    2. AD FS authenticates the person utilizing Lively Listing credentials, and returns a SAML assertion.
    3. The consumer makes a name to Lake Formation, which initiates an inner name with AWS Safety Token Service (AWS STS) to imagine a task with SAML for the consumer.
    4. Lake Formation returns non permanent AWS credentials with permissions of the outlined IAM position to the consumer.
    5. The consumer makes use of the non permanent AWS credentials to name the Athena API StartQueryExecution.
    6. Athena retrieves the desk and related metadata from the AWS Glue Knowledge Catalog.
    7. On behalf of the person, Athena requests entry to the info from Lake Formation (GetDataAccess). Lake Formation assumes the IAM position related to the info lake location and returns non permanent credentials.
    8. Athena makes use of the non permanent credentials to retrieve knowledge objects from Amazon S3.
    9. Athena returns the outcomes to the consumer primarily based on the outlined entry permissions.

    For our use case, we use two pattern tables:

    • LINEORDER – A reality desk containing orders
    • CUSTOMER – A dimension desk containing buyer info together with Personally Identifiable Info (PII) columns (c_name, c_phone, c_address)

    We even have knowledge shopper customers who’re members of the next groups:

    • CustomerOps – Can see each orders and buyer info, together with PII attributes of the client
    • Finance – Can see orders for analytics and aggregation functions however solely non-PII attributes of the client

    To reveal this use case, we create two customers referred to as CustomerOpsUser and FinanceUser and three AD teams for various entry patterns: data-customer (buyer info entry excluding PII attributes), data-customer-pii (full buyer info entry together with PII attributes), and data-order (order info entry). By including the customers to those three teams, we are able to grant the precise degree of entry to totally different tables and columns.


    To comply with together with this walkthrough, you will need to meet the next stipulations:

    Arrange an IAM SAML supplier and position

    To arrange your SAML supplier, full the next steps:

    1. Within the IAM console, select Id suppliers within the navigation pane.
    2. Select Add supplier.
    3. For Supplier Kind, select SAML.
    4. For Supplier Title, enter adfs-saml-provider.
    5. For Metadata Doc, obtain your AD FS server’s federation XML file by coming into the next handle in a browser with entry to the AD FS server:

    6. Add the file to AWS by selecting Select file.
    7. Select Add supplier to complete.

    Now you’re able to create a brand new IAM position.

    1. Within the navigation pane, select Roles.
    2. Select Create position.
    3. For the kind of trusted entity, select SAML 2.0 federation.
    4. For SAML supplier, select the supplier you created (adfs-saml-provider).
    5. Select Enable programmatic and AWS Administration Console entry.
    6. The Attribute and Worth fields ought to mechanically populate with SAML:aud and
    7. Select Subsequent:Permissions.
    8. Add the mandatory IAM permissions to this position. For this put up, connect AthenaFullAccess.

    If the Amazon S3 location on your Athena question outcomes doesn’t begin with aws-athena-query-results, add one other coverage to permit customers write question outcomes into your Amazon S3 location. For extra info, see Specifying a Question Outcome Location Utilizing the Athena Console and Writing IAM Insurance policies: Tips on how to Grant Entry to an Amazon S3 Bucket.

    1. Go away the defaults within the subsequent steps and for Function identify, enter adfs-data-access.
    2. Select Create position.
    3. Be aware of the SAML supplier and IAM position names to make use of in later steps when creating the belief between the AWS account and AD FS.

    Configure AD FS

    SAML-based federation has two participant events: the IdP (Lively Listing) and the relying social gathering (AWS), which is the service or utility that wishes to make use of authentication from the IdP.

    To configure AD FS, you first add a relying social gathering belief, you then configure SAML declare guidelines for the relying social gathering. Declare guidelines are the way in which that AD FS varieties a SAML assertion despatched to a relying social gathering. The SAML assertion states that the details about the AD person is true, and that it has authenticated the person.

    Add a relying social gathering belief

    To create your relying social gathering in AD FS, full the next steps:

    1. Log in to the AD FS server.
    2. On the Begin menu, open ServerManger.
    3. On the Instruments menu, select the AD FS Administration console.
    4. Underneath Belief Relationships within the navigation pane, select Relying Social gathering Trusts.
    5. Select Add Relying Social gathering Belief.
    6. Select Begin.
    7. Choose Import knowledge concerning the relying social gathering revealed on-line or on an area community and enter the URL

    The metadata XML file is an ordinary SAML metadata doc that describes AWS as a relying social gathering.

    1. Select Subsequent.
    2. For Show identify, enter a reputation on your relying social gathering.
    3. Select Subsequent.
    4. Choose I don’t need to configure multi-factor authentication.

    For elevated safety, we advocate that you simply configure multi-factor authentication to assist shield your AWS assets. We don’t allow multi-factor authentication for this put up as a result of we’re utilizing a pattern dataset.

    1. Select Subsequent.
    2. Choose Allow all customers to entry this relying social gathering and select Subsequent.

    This enables all customers in Lively Listing to make use of AD FS with AWS as a relying social gathering. You need to take into account your safety necessities and alter this configuration accordingly.

    1. End creating your relying social gathering.

    Configure SAML declare guidelines for the relying social gathering

    You create two units of declare guidelines on this put up. The primary set (guidelines 1–4) accommodates AD FS declare guidelines which are required to imagine an IAM position primarily based on AD group membership. These are the foundations that you simply additionally create if you wish to set up federated entry to the AWS Administration Console. The second set (guidelines 5–6) are declare guidelines which are required for Lake Formation fine-grained entry management.

    To create AD FS declare guidelines, full the next steps:

    1. On the AD FS Administration console, discover the relying social gathering you created within the earlier step.
    2. Proper-click the relying social gathering and select Edit Declare Guidelines.
    3. Select Add Rule and create your six new guidelines.
    4. Create declare rule 1, referred to as NameID:
      1. For Rule template, use Remodel an Incoming Declare.
      2. For Incoming declare kind, select Home windows account identify.
      3. For Outgoing declare kind, select Title ID.
      4. For Outgoing identify ID format, select Persistent Identifier.
      5. Choose Cross via all declare values.
    5. Create declare rule 2, referred to as RoleSessionName:
      1. For Rule template, use Ship LDAP Attribute as Claims.
      2. For Attribute retailer, select Lively Listing.
      3. For Mapping of LDAP attributes to outgoing declare sorts, add the attribute E-Mail-Addresses and outgoing declare kind
    6. Create declare rule 3, referred to as Get AD Teams:
      1. For Rule template, use Ship Claims Utilizing a Customized Rule.
      2. For Customized rule, enter the next code:
        c:[Type == "", Issuer == "AD AUTHORITY"]
        => add(retailer = "Lively Listing", sorts = ("http://temp/variable"), question = ";tokenGroups;{0}", param = c.Worth);

    7. Create declare rule 4, referred to as Roles:
      1. For Rule template, use Ship Claims Utilizing a Customized Rule.
      2. For Customized rule, enter the next code (enter your account quantity and identify of the SAML supplier you created earlier):
        c:[Type == "http://temp/variable", Value =~ "(?i)^aws-"]
        => challenge(Kind = "", Worth = RegExReplace(c.Worth, "aws-", "arn:aws:iam::<AWS ACCOUNT NUMBER>:saml-provider/<adfs-saml-provider>,arn:aws:iam::<AWS ACCOUNT NUMBER>:position/"));

    Declare guidelines 5 and 6 permit Lake Formation to make authorization choices primarily based on person identify or the AD group membership of the person.

    1. Create declare rule 5, referred to as LF-UserName, which passes the person identify and SAML assertion to Lake Formation:
      1. For Rule template, use Ship LDAP Attributes as Claims.
      2. For Attribute retailer, select Lively Listing.
      3. For Mapping of LDAP attributes to outgoing declare sorts, add the attribute Person-Principal-Title and outgoing declare kind
    2. Create declare rule 6, referred to as LF-Teams, which passes knowledge and analytics-related AD teams that the person is a member of, together with the SAML assertion to Lake Formation:
      1. For Rule template, use Ship Claims Utilizing a Customized Rule.
      2. For Customized rule, enter the next code:
        c:[Type == "http://temp/variable", Value =~ "(?i)^data-"]
        => challenge(Kind = "", Worth = c.Worth);

    The previous rule snippet filters AD group names beginning with data-. That is an arbitrary naming conference; you may undertake your most well-liked naming conference for AD teams which are associated to knowledge lake entry.

    Create Lively Listing customers and teams

    On this part, we create two AD customers and required AD teams to reveal various ranges of entry to the info.

    Create customers

    You create two AD customers: FinanceUser and CustomerOpsUser. Every person corresponds to a person who’s a member of the Finance or Buyer enterprise models. The next desk summarizes the main points of every person.


    FinanceUser CustomerOpsUser
    First Title FinanceUser CustomerOpsUser
    Person logon identify
    E mail

    To create your customers, full the next steps:

    1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
    2. Within the navigation pane, select Customers.
    3. On the software bar, select the Create person icon.
    4. For First identify, enter FinanceUser.
    5. For Full identify, enter FinanceUser.
    6. For Person logon identify, enter
    7. Select Subsequent.
    8. Enter a password and deselect Person should change password at subsequent logon.

    We select this selection for simplicity, however in real-world eventualities, newly created customers should change their password for safety causes.

    1. Select Subsequent.
    2. In Lively Listing Customers and Computer systems, select the person identify.
    3. For E mail, enter

    Including an e-mail is necessary as a result of it’s used because the RoleSessionName worth within the SAML assertion.

    1. Select OK.
    2. Repeat these steps to create CustomerOpsUser.

    Create AD teams to characterize knowledge entry patterns

    Create the next AD teams to characterize three totally different entry patterns and likewise the power to imagine an IAM position:

    • data-customer – Members have entry to non-PII columns of the buyer desk
    • data-customer-pii – Members have entry to all columns of the buyer desk, together with PII columns
    • data-order – Members have entry to the lineorder desk
    • aws-adfs-data-access – Members assume the adfs-data-access IAM position when logging in to AWS

    To create the teams, full the next steps:

    1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
    2. On the software bar, select the Create new group icon.
    3. For Group identify¸ enter data-customer.
    4. For Group scope, choose International.
    5. For Group kind¸ choose Safety.
    6. Select OK.
    7. Repeat these steps to create the remaining teams.

    Add customers to applicable teams

    Now you add your newly created customers to their applicable teams, as detailed within the following desk.

    Person Group Membership Description
    CustomerOpsUser data-customer-pii
    Sees all buyer info together with PII and their orders
    FinanceUser data-customer
    Sees solely non-PII buyer knowledge and orders

    Full the next steps:

    1. On the Server Supervisor Dashboard, on the Instruments menu, select Lively Listing Customers and Computer systems.
    2. Select the person FinanceUser.
    3. On the Member Of tab, select Add.
    4. Add the suitable teams.
    5. Repeat these steps for CustomerOpsUser.

    Create a database and tables within the knowledge lake

    On this step, you copy knowledge recordsdata to an S3 bucket in your AWS account by working the next AWS Command Line Interface (AWS CLI) instructions. For extra info on methods to arrange the AWS CLI, check with Configuration Fundamentals.

    These instructions copy the recordsdata that include knowledge for buyer and lineorder tables. Exchange <BUCKET NAME> with the identify of an S3 bucket in your AWS account.

    aws s3 sync s3://awssampledb/load/ s3://<BUCKET NAME>/buyer/ 
    --exclude "*" --include "customer-fw.tbl-00*" --exclude "*.bak"
    aws s3api copy-object --copy-source awssampledb/load/lo/lineorder-single.tbl000.gz 
    --key lineorder/lineorder-single.tbl000.gz --bucket <BUCKET NAME> 
    --tagging-directive REPLACE

    For this put up, we use the default settings for storing knowledge and logging entry requests inside Amazon S3. You may improve the safety of your delicate knowledge with the next strategies:

    • Implement encryption at relaxation utilizing AWS Key Administration Service (AWS KMS) and buyer managed encryption keys
    • Use AWS CloudTrail and audit logging
    • Prohibit entry to AWS assets primarily based on the least privilege precept

    Moreover, Lake Formation is built-in with CloudTrail, a service that gives a report of actions taken by a person, position, or AWS service in Lake Formation. CloudTrail captures all Lake Formation API calls as occasions and is enabled by default if you create a brand new AWS account. When exercise happens in Lake Formation, that exercise is recorded as a CloudTrail occasion together with different AWS service occasions in occasion historical past. For audit and entry monitoring functions, all federated person logins are logged through CloudTrail underneath the AssumeRoleWithSAML occasion identify. You too can view particular person exercise primarily based on their person identify in CloudTrail.

    To create a database and tables within the Knowledge Catalog, open the question editor on the Athena console and enter the next DDL statements. Exchange <BUCKET NAME> with the identify of the S3 bucket in your account.

    CREATE DATABASE salesdata;
    CREATE EXTERNAL TABLE salesdata.buyer
        c_custkey VARCHAR(10),
        c_name VARCHAR(25),
        c_address VARCHAR(25),
        c_city VARCHAR(10),
        c_nation VARCHAR(15),
        c_region VARCHAR(12),
        c_phone VARCHAR(15),
        c_mktsegment VARCHAR(10)
    -- The information recordsdata include fastened width columns therefore utilizing RegExSerDe
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
        "enter.regex" = "(.{10})(.{25})(.{25})(.{10})(.{15})(.{12})(.{15})(.{10})"
    LOCATION 's3://<BUCKET NAME>/buyer/';
    CREATE EXTERNAL TABLE salesdata.lineorder(
      `lo_orderkey` int, 
      `lo_linenumber` int, 
      `lo_custkey` int, 
      `lo_partkey` int, 
      `lo_suppkey` int, 
      `lo_orderdate` int, 
      `lo_orderpriority` varchar(15), 
      `lo_shippriority` varchar(1), 
      `lo_quantity` int, 
      `lo_extendedprice` int, 
      `lo_ordertotalprice` int, 
      `lo_discount` int, 
      `lo_revenue` int, 
      `lo_supplycost` int, 
      `lo_tax` int, 
      `lo_commitdate` int, 
      `lo_shipmode` varchar(10))
    LOCATION 's3://<BUCKET NAME>/lineorder/';

    Confirm that tables are created and you’ll see the info:

    SELECT * FROM "salesdata"."buyer" restrict 10;
    SELECT * FROM "salesdata"."lineorder" restrict 10;

    Arrange the Lake Formation permission mannequin

    Lake Formation makes use of a mixture of Lake Formation permissions and IAM permissions to realize fine-grained entry management. The really useful strategy contains the next:

    • Coarse-grained IAM permissions – These apply to the IAM position that customers assume when working queries in Athena. IAM permissions management entry to Lake Formation, AWS Glue, and Athena APIs.
    • High quality-grained Lake Formation grants – These management entry to Knowledge Catalog assets, Amazon S3 areas, and the underlying knowledge at these areas. With these grants, you may give entry to particular tables or solely columns that include particular knowledge values.

    Configure IAM position permissions

    Earlier within the walkthrough, you created the IAM position adfs-data-access and connected the AWS managed IAM coverage AthenaFullAccess to it. This coverage has all of the permissions required for the needs of this put up.

    For extra info, see the Knowledge Analyst Permissions part in Lake Formation Personas and IAM Permissions Reference.

    Register an S3 bucket as a knowledge lake location

    The mechanism to manipulate entry to an Amazon S3 location utilizing Lake Formation is to register a knowledge lake location. Full the next steps:

    1. On the Lake Formation console, select Knowledge lake areas.
    2. Select Register location.
    3. For Amazon S3 path, select Browse and find your bucket.
    4. For IAM position, select AWSServiceRoleForLakeFormationDataAccess.

    On this step, you specify an IAM service-linked position, which Lake Formation assumes when it grants non permanent credentials to built-in AWS providers that entry the info on this location. This position and its permissions are managed by Lake Formation and may’t be modified by IAM principals.

    1. Select Register location.

    Configure knowledge permissions

    Now that you’ve got registered the Amazon S3 path, you may give AD teams applicable permissions to entry tables and columns within the salesdata database. The next desk summarizes the brand new permissions.

    Database and Desk AD Group Title Desk Permissions Knowledge Permissions
    salesdata.buyer data-customer Choose c_city, c_custkey, c_mktsegment, c_nation, and c_region
    salesdata.buyer data-customer-pii Choose All knowledge entry
    salesdata.lineorder data-order Choose All knowledge entry
    1. On the Lake Formation console, select Tables within the navigation pane.
    2. Filter tables by the salesdata database.
    3. Choose the buyer desk and on the Actions menu, select View permissions.

    You need to see following present permissions. These entries permit the present knowledge lake administrator to entry the desk and all its columns.

    1. So as to add new permissions, choose the desk and on the Actions menu, select Grant.
    2. Choose SAML person and teams.
    3. For SAML and Amazon QuickSight customers and teams, enter arn:aws:iam::<AWS ACCOUNT NUMBER>:saml-provider/adfs-saml-provider:group/data-customer.

    To get this worth, get the ARN of the SAML supplier from the IAM console and append :group/data-customer to the tip of it.

    1. Choose Named knowledge catalog assets.
    2. For Databases, select the salesdata database.
    3. For Tables, select the buyer desk.
    4. For Desk permissions, choose Choose.
    5. For Knowledge permissions, choose Column-based entry.
    6. For Choose columns, add the columns c_city, c_custkey, c_mktsegment, c_nation, and c_region.
    7. Select Grant.

    You’ve gotten now allowed members of the AD group data-customer to have entry to columns of the buyer desk that don’t embody PII.

    1. Repeat these steps for the buyer desk and data-customer-pii group with all knowledge entry.
    2. Repeat these steps for the lineorder desk and data-order group with all knowledge entry.

    Arrange a SQL consumer with JDBC connection and confirm permissions

    On this put up, we use SQL Workbench to entry Athena via AD authentication and confirm the Lake Formation permissions you created within the earlier part.

    Put together the SQL consumer

    To arrange the SQL consumer, full the next steps:

    1. Obtain and extract the Lake Formation-compatible Athena JDBC driver with AWS SDK (2.0.14 or later model) from Utilizing Athena with the JDBC Driver.
    2. Go to the SQL Workbench/J web site and obtain the newest secure bundle.
    3. Set up SQL Workbench/J in your consumer laptop.
    4. In SQL Workbench, on the File menu, select Handle Drivers.
    5. Select the New driver icon.
    6. For Title, enter Athena JDBC Driver.
    7. For Library, browse to and select the Simba Athena JDBC .jar file that you simply simply downloaded.
    8. Select OK.

    You’re now able to create connections in SQL Workbench on your customers.

    Create connections in SQL Workbench

    To create your connections, full the next steps:

    1. On the File menu, select Join.
    2. Enter the identify Athena-FinanceUser.
    3. For Driver, select the Simba Athena JDBC driver.
    4. For URL, enter the next code (change the placeholders with precise values out of your setup and take away the road breaks to make a single line connection string):
    jdbc:awsathena://AwsRegion=<AWS Area Title e.g. ap-southeast-2>;
    S3OutputLocation=s3://<Athena Question Outcome Bucket Title>/jdbc;
    idp_host=<adfs-server-name e.g.>;
    preferred_role=<ARN of the position created in step1 e.g. arn>;
    person=financeuser@<Area Title e.g.>;

    For this put up, we used a self-signed certificates with AD FS. This certificates just isn’t trusted by the consumer, due to this fact authentication doesn’t succeed. That is why the SSL_Insecure attribute is ready to true to permit authentication regardless of the self-signed certificates. In real-world setups, you’d use legitimate trusted certificates and may take away the SSL_Insecure attribute.

    1. Create a brand new SQL workbench profile named Athena-CustomerOpsUser and repeat the sooner steps with CustomerOpsUser within the connection URL string.
    2. To check the connections, select Check for every person, and make sure that the connection succeeds.

    Confirm entry permissions

    Now we are able to confirm permissions for FinanceUser. Within the SQL Workbench Assertion window, run the next SQL SELECT assertion:

    SELECT * FROM "salesdata"."lineorder" restrict 10;
    SELECT * FROM "salesdata"."buyer" restrict 10;

    Confirm that solely non-PII columns are returned from the buyer desk.

    As you see within the previous screenshots, FinanceUser solely has entry to non-PII columns of the buyer desk and full entry to (all columns) of the lineorder desk. This enables FinanceUser, for instance, to run mixture and abstract queries primarily based on market section or location of shoppers with out getting access to their private info.

    Run an identical question for CustomerOpsUser. You need to be capable to see all columns, together with columns containing PII, within the buyer desk.


    This put up demonstrated methods to configure your knowledge lake permissions utilizing Lake Formation for AD customers and teams. We configured AD FS 3.0 in your Lively Listing and used it as an IdP to federate into AWS utilizing SAML. This put up additionally confirmed how one can combine your Athena JDBC driver to AD FS and use your AD credentials straight to hook up with Athena.

    Integrating your Lively Listing with the Athena JDBC driver offers you the pliability to entry Athena from enterprise intelligence instruments you’re already conversant in to research the info in your Amazon S3 knowledge lake. This lets you have a constant central permission mannequin that’s managed via AD customers and their group memberships.

    Concerning the Authors

    Mostafa Safipour is a Options Architect at AWS primarily based out of Sydney. Over the previous decade he has helped many giant organizations within the ANZ area construct their knowledge, digital, and enterprise workloads on AWS.

    Praveen Kumar is a Specialist Resolution Architect at AWS with experience in designing, constructing, and implementing fashionable knowledge and analytics platforms utilizing cloud-native providers. His areas of pursuits are serverless expertise, streaming purposes, and fashionable cloud knowledge warehouses.


    Please enter your comment!
    Please enter your name here