Data Locker—storage setup

At a glance: Stream Data Locker data to your AWS or GCS storage. Integrate GCS with BigQuery and Google Data Studio. [Beta] Connect to Snowflake.

Stream Data Locker to your storage

Related reading: Data Locker

Data Locker enables you to stream data to your selected and owned storage solution. Set up your storage using one of the following procedures. 

GCS storage

  • The procedure in this section needs to be performed by your Google Cloud admin.
  • You can delete files from Data Locker 25 or more hours after they were written. Don't delete them before.

Information for the GCS admin

Data Locker is the AppsFlyer solution for streaming data to storage.

Requirements

  • Create a bucket on GCS for the exclusive use of Data Locker. Exclusive means no other service writes data to the bucket. 
  • Suggested bucket name: af-datalocker.
  • Grant Data Locker permissions using the procedure that follows.

To grant Data Locker permissions:

In this procedure, substitute data-locker-example using the name of the bucket you previously created for Data Locker. 

  1. Sign in to your GCS console.
  2. Go to Storage > Storage browser.

    mceclip0.png

  3. Select the bucket you previously created, for example, data-locker-example
  4. Go to the Permissions tab. 
  5. Click +Add.
    The Add members window opens.
  6. Complete as follows:
    1. New members, paste the snippet that follows.
      af-data-delivery@af-raw-data.iam.gserviceaccount.com
    2. Select a role: Cloud storage > Storage Object Admin

      mceclip0.png

  7. Click Save

AWS storage

  • The procedure in this section needs to be performed by your AWS admin.
  • You can delete files from Data Locker 25 or more hours after they were written. Please don't delete them before. 

Information for the AWS Admin

Data Locker is the AppsFlyer solution for streaming data to storage.

Requirements

  • Create an AWS bucket having the name af-datalocker-mybucket. The prefix af-datalocker- is mandatory . The suffix is free text.
  • We suggest af-datalocker-yyyy-mm-dd-hh-mm-free-text. Where yyyy-mm-dd-hh-mm is the current date and time, and you add any other text you want as depicted in the figure that follows.

User interface in AWS console

MyBucket.jpg

After creating the bucket, grant AppsFlyer permissions using the procedure that follows. 

To create a bucket and grant AppsFlyer permissions: 

  1. Sign in to the AWS console.
  2. Go to the S3 service.
  3. To create the bucket:
    1. Click Create bucket.
    2. Complete the Bucket name as follows: Start with af-datalocker- and then add any other text as described previously.
    3. Click Create bucket.
  4. To grant AppsFlyer permissions:
    1. Select the bucket. 
    2. Go to the Permissions tab. 
    3. In the Bucket policy section, click Edit. 
      The Bucket policy window opens.
    4. Paste the following snippet into the window.
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AF_DataLocker_Direct",
            "Effect": "Allow",
            "Principal": {
              "AWS": "arn:aws:iam::195229424603:user/product=datalocker__envtype=prod__ns=default"
            },
            "Action": [
              "s3:GetObject",
              "s3:ListBucket",
              "s3:DeleteObject",
              "s3:PutObject"
            ],
            "Resource": [
              "arn:aws:s3:::af-datalocker-my-bucket",
              "arn:aws:s3:::af-datalocker-my-bucket/*"
            ]
          }
        ]
      }
      
  5. In the snippet, replace af-data-locker-my-bucket it with the bucket name you created.

  6. Click Save changes.

  7. Complete the Setup Data Locker procedure.

Basic guide to connect BiqQuery and Google Data Studio to Data Locker

The sections that follow are a basic guide to connecting your GCS Data Locker to BigQuery and Google Data Studio. Their purpose is to let you see that the connection is quick, straightforward, and out-of-the-box. AppsFlyer doesn't provide support services in relation to BigQuery and Google Data Studio. 

Connect GCS Data Locker bucket to BigQuery

The steps in this section are a guide on how to import data in your Data Locker to BigQuery. 

Related reading: Quickstart using the Google Cloud Console

To load the installs report from your Data Locker GCS bucket into BigQuery complete the procedures that follow.


Prerequisites 

  • Setup Data Locker with GCS as your storage destination. 
  • Have the necessary permissions in Google Cloud to set up your dataset. 


Step 1—Create a BigQuery dataset:

  1. In your Google Cloud Platform console, go to BigQuery.
  2. Create a project or use an existing project. 
  3. In the project, click CREATE DATASET.

    GCSCreateDataSEt_1_.png

  4. Give the dataset a suitable ID.
  5. Complete the remaining settings as required. 


Step 2—Connect a BigQuery table to Data Locker:

  1. In the dataset, click CREATE TABLE.

    GCScreateTable.png

  2. Set the source as follows:
    1. Create table from: Google Cloud Storage
    2. Select file from GCS bucket: Browse to your bucket and select a report. For example, t=installs. 
    3. You can set a wildcard to get data from all the subfolders in the t=installs folder by setting the *.gz wildcard. 
    4. Set the File format to CSV.
    5. Select an existing project or enter a new project name
    6. Set Table type to Native table

 

Step 3—You are all set to query the data 

The data is automatically loaded to BigQuery.

Display Data Locker data in Google Data Studio

You can connect Google Data Studio to your Data Locker Data. To do so you must connect Data Locker to BiqQuery as described in the previous section. 

Prerequisites

  • Connect Data Locker to BigQuery. 

To create display Data Locker data in Google Data Studio:

  1. Create a report in Google Data Studio.
  2. Select BigQuery as the data source.

    DataStudio.png

  3. Select a project and tables to your Google Data Studio report and begin to analyze the data.

Connectors

Snowflake

The Snowflake option is current available as a beta. 

Connect Data Locker to your Snowflake account. By doing so, data is sent to Snowflake and continues to be available in your selected cloud storage. To take part in the Snowflake beta connector, contact your CSM. 

Considerations for BI developers

  • The data freshness rate is the same as that of data provided in a bucket. 
  • The table and column structure of the data is equivalent to that found in the data available directly from a Data Locker bucket. 
  • As rows are added to the Snowflake share, the _ingestion_time column is populated. To ensure row uniqueness and to prevent ingestion of the same row more than once:
    1. Save the max_ingestion_time per table ingested.
    2. Each time you run your ingestion process, ingest only those rows where _ingestion_time > max_ingestion_time

Complete the procedures that follow to connect Snowflake to Data Locker. 

Snowflake connector procedures

To get your Snowflake account ID and region:

  1. In Snowflake, log in to your Snowflake account.
  2. In the menu bar, select your name.
    Your account ID and region display.

    SnowflakeAccountId.png

  3. Send your Snowflake account ID and region to your AppsFlyer CSM, and ask them to enable Snowflake in your Data Locker. 

To connect Data Locker to Snowflake:

  1. In AppsFlyer, go to Integration > Data Locker.
  2. Select Snowflake.
  3. Complete the Snowflake account ID and Snowflake region using the information you previously got from Snowflake. 
  4. Click Save.

To create a database from a share in Snowflake:

  1. In Snowflake, log in to your Snowflake account.
  2. Switch role to Accountadmin. See Create a database from a share.
  3. Select Shares.
  4. Select the AppsFlyer share. For example, APPSFLYER_ACC_XXX_DATA_LOCKER. 
  5. Click Create Database from Secure Share, and complete the details required. Note! You must load the data from the shared database into your tables, because data in the shared database is only available for a limited period (currently 14 days).  
  6. In your database, the tables imported display. Table names and structures are equivalent to those in Data Locker buckets.
Was this article helpful?