Data Clean Room—Cloud storage and data file setup

At a glance: Set up cloud service buckets, folder paths, and files for use by the AppsFlyer Data Clean Room. Buckets can be located on AWS S3, GCS, or both.

Overview

The AppsFlyer Data Clean Room (DCR) allows advertisers to leverage the value of their first-party user-level data by matching and enriching it with AppsFlyer user-level attribution data. The resulting aggregated reports preserve user privacy while allowing advertisers the valuable insights that only this combined data can provide.

Preparing to use the DCR involves setting up cloud service storage and ensuring that the data files you will upload are properly formatted and transmitted to the DCR.

Cloud service storage

Cloud service storage is used by the AppsFlyer Data Clean Room (DCR) for 2 primary purposes—

  • Input: Location from which AppsFlyer reads first-party data files produced by your BI system
  • Output: Destination to which AppsFlyer delivers reports after DCR processing

You can use one or more buckets for these purposes (on AWS, GCS, or both). However, in most cases, the easiest-to-manage structure includes:

  • A single bucket on a single cloud service
  • A folder identified by your DCR Key directly underneath the bucket
  • 2 separate folder paths underneath the top-level folder: one for input and one for output

This article provides the instructions for creating this structure.

DCR naming requirements

The following naming requirements apply to all DCR data entities (buckets, folders, and files):

  • Maximum length: 200 characters
  • Valid characters:
    • letters (A-Z, a-z)
    • numbers (0-9), cannot be the first character of a name
    • hyphens (-), cannot be the first character of a name
  • Invalid characters:
    • spaces
    • all other symbols or special characters
  • Characters used for special purposes only:

 Note

AWS and GCS automatically append a slash (/) to the end of each folder name. Do not include this character when naming your buckets or folders.

Creating a bucket

Buckets are created using the interface of your selected cloud service, as described in the tabs below.

The following requirements are relevant to buckets on both cloud services:

  • Bucket name:
    • The bucket name must begin with af-dcr-
    • Example: af-dcr-example-bucket
  • Additional:
    • The AppsFlyer DCR service must be given bucket permissions. See instructions for granting these permissions in the tabs for each cloud service below.
    • The bucket must be for the exclusive use of AppsFlyer Data Clean Room. In other words, no other service can write data to the bucket.

AWS bucket

Note: The following procedure must be performed by your AWS admin.

To create a bucket and grant AppsFlyer permissions: 

  1. Sign in to the AWS console.
  2. Go to the S3 service.
  3. To create the bucket:
    1. Click Create bucket.
    2. Complete the Bucket name, starting with af-dcr-, followed by your text (as described above).
    3. Click Create bucket.
  4. To grant AppsFlyer bucket permissions:
    1. Select the bucket you created. 
    2. Go to the Permissions tab. 
    3. In the Bucket policy section, click Edit.
      The Edit bucket policy window opens.
    4. Paste the following code snippet into the window.
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "AF-DCR",
            "Effect": "Allow",
            "Principal": {
              "AWS": "arn:aws:iam::195229424603:user/product=dcr-reporter__envtype=prod__ns=default"
            },
            "Action": [
              "s3:GetObject",
              "s3:ListBucket",
              "s3:DeleteObject",
              "s3:PutObject"
            ],
            "Resource": [
              "arn:aws:s3:::af-dcr-mybucket",
              "arn:aws:s3:::af-dcr-mybucket/*"
            ]
          }
        ]
      }
      
  5. In the snippet, replace af-dcr-mybucket (in the 2 lines in which it appears) with the name of the bucket you created.
    Caution! When replacing the bucket name in the snippet, be sure not to overwrite /* in the second line in which the bucket name appears.

  6. Click Save changes.

GCS bucket

Note: The following procedure must be performed by your Google Cloud admin.

To create a bucket and grant AppsFlyer permissions: 

  1. Sign in to your GCS console.
  2. Go to the Cloud Storage Browser page.
  3. To create the bucket:
    1. Click Create bucket.
    2. Enter your bucket information on the Create a bucket page. Include the bucket name, starting with af-dcr- and followed by your text (as described above).
    3. Click Continue.
    4. Click Create.
  4. To grant AppsFlyer bucket permissions:
    1. Select the bucket you created. 
    2. Go to the Permissions tab. 
    3. In the Permissions section, click + Add.
      The Add members window opens.
    4. In the New members box, paste the snippet that follows.
      appsflyer-dcr@dcr-report.iam.gserviceaccount.com
    5. From the Role list, select Cloud storage > Storage Admin.

      dcr_gcs_permissions.png

    6. Click Save.

Creating a DCR key folder

To ensure maximum security, the folder directly beneath the bucket (the "DCR key") must be named with the 8-character, alphanumeric DCR key assigned to your account (for example, 01bcc5fb). Note that this is different from any other password or key associated with your AppsFlyer account.

The DCR key folder is generally created manually using the interface of your selected cloud service.

To get your account's DCR key, click the DCR key button at the top of the main DCR screen.

dcr_key_button.png

 Example

After creating the DCR key folder, your bucket/folder structure would look something like this:

af-dcr-example-bucket/01bcc5fb/

Creating an input folder path

The detailed requirements for creating each element of the input folder path are described in the tabs below.

Top-level input folder

Though it is not required, best practice is to create a top-level input folder directly beneath the DCR key folder. This folder will be dedicated to files you upload to the DCR.

The top-level input folder is generally created manually using the interface of your selected cloud service.

  • This practice is even more highly recommended when you are using the same bucket both for uploading data files (input) and receiving reports (output).
  • You can name this folder anything you want, so long as it complies with the DCR naming requirements. For ease of identification, it is usually named input/.

 Example

After creating the top-level input folder, your bucket/folder structure might look something like this:

af-dcr-example-bucket/01bcc5fb/input/

Second-level folder for each data source

You can regularly upload different data source files to the DCR for processing. Each of these data sources must be assigned a separate folder ("data source folders").

So, for example, if you plan to upload 2 files to the DCR for processing every day: BI-data.csv and CRM-data.gzip, you would assign each of these data sources a folder. You could choose to call these folders BI-data/ and CRM-data/.

The data source folders are generally created manually using the interface of your selected cloud service.

 Example

After creating 2 data source folders, your bucket/folder structure might look something like this:

af-dcr-example-bucket/01bcc5fb/input/BI-data/
                                     CRM-data/

Nested subfolders for each date and version

We've finally arrived at the part of the folder structure where the real action happens: the folders in which AppsFlyer continually looks for new data files to read into the DCR.

Each time you want AppsFlyer to process a data source file and run a report based on it, you upload a new version of the file to the data source folder, within a series of nested subfolders indicating the date and version number (plus one extra subfolder to let AppsFlyer know where the data is):

  • Within each data source folder --> 1 subfolder for each date ("date folder")
    • Format: dt=yyyy-mm-dd/
    • Example: dt=2022-03-10/
  • Within each date folder --> 1 subfolder for each version on that date ("version folder")
    • Format: v=n/
    • Example: v=1/
    • Note: The version folder is required even if you only upload the file one time per day.
  • Within each version folder --> 1 subfolder to indicate the location of the data ("data folder")
    • Format: data/

In most cases, you would use API calls or other available programmatic means to create the date/version/data folders automatically each time the data source file is uploaded. For additional information, see the API references for your cloud service: AWS, GCS.

Since the full folder structure is programmatically created at the time files are uploaded, a realistic example includes both folders and files. See this illustration in the Files tab, below.

Files

Data source files

Uploaded data source files must meet these name, file format, and location requirements:

  • Must comply with DCR naming requirements
  • CSV or GZIP format. The file underlying GZIP compression must be a CSV file.
  • Number of data source files per data folder:
    • CSV: Maximum of 1
    • GZIP: Maximum of 1 single-part file. Multi-part GZIP files are supported when named as follows: filename_part01.gzip, filename_part02.gzip, etc.

Data within the source files must meet these requirements:

  • Date and time:
    • Format: yyyy-MM-dd hh:mm:ss
    • Time zone: UTC
  • Numbers: maximum 2 digits following the decimal point
  • String length: maximum of 256 characters
  • Character limitations: none (all characters are valid)

 

_SUCCESS files

Once the upload of a data source file to the data folder is complete, an empty file named _SUCCESS should be uploaded to the version folder. This alerts AppsFlyer that a new file is available to be processed. In most cases, you would use an API script to automatically generate and upload this file.

Important! The _SUCCESS file is uploaded to the version folder, outside the data folder.

The filename:

  • Must be in ALL CAPS
  • Must be preceded by an underscore (_)
  • Should not have a file extension

For multi-part files:

  • Only one _SUCCESS file should be uploaded for all file parts.
  • The _SUCCESS file should be uploaded only after all file part uploads are complete.

 Example

After uploading source data files on 2 days (and programmatically creating date/version/data folders and _SUCCESS files), your bucket/folder structure might look something like this:

af-dcr-example-bucket/01bcc5fb/input/BI-data/
dt=2022-03-10/
v=1/
_SUCCESS
data/
BI-data.csv
dt=2022-03-11/
v=1/
_SUCCESS
data/
BI-data.csv CRM-data/
dt=2022-03-10/
v=1/
_SUCCESS
data/
CRM-data_part01.gzip
CRM-data_part02.gzip
v=2/
_SUCCESS
data/
CRM-data_part01.gzip
CRM-data_part02.gzip
dt=2022-03-11/
v=1/
_SUCCESS
data/
CRM-data_part01.gzip
CRM-data_part02.gzip
v=2/
_SUCCESS
data/
CRM-data_part01.gzip
CRM-data_part02.gzip

Creating an output folder path

The detailed requirements for creating each element of the output folder path are described in the tabs below.

Top-level output folder

Though it is not required, best practice is to create a top-level output folder directly beneath the DCR key folder. This folder will be dedicated to reports delivered by the DCR.

The top-level ouput folder is generally created manually using the interface of your selected cloud service.

  • This practice is even more highly recommended when you are using the same bucket both for uploading data files (input) and receiving reports (output).
  • You can name this folder anything you want, so long as it complies with the DCR naming requirements. For ease of identification, it is usually named output/.

 Example

After creating the top-level output folder, your bucket/folder structure might look something like this:

af-dcr-example-bucket/01bcc5fb/output/

Second-level folder for each report

You can regularly receive any number of custom-designed reports from the DCR. Each of these reports must be assigned a separate folder ("report folders").

So, for example, if you will be receiving 2 reports from the DCR: conversions report and retargeting report, you would assign each of these data sources a folder. You could choose to call these folders conversions/ and retargeting/.

The report folders are generally created manually using the interface of your selected cloud service.

 Example

After creating 2 report folders, your bucket/folder structure might look something like this:

af-dcr-example-bucket/01bcc5fb/output/conversions/
                                      retargeting/

Nested subfolders for each date and version (not customer-created)

Unlike in the input folder path, you do not create nested date/version folders in the output folder path. AppsFlyer will automatically create this folder structure each time a report is delivered.

Report file format

DCR reports are delivered in CSV format.

Was this article helpful?