Data Locker—for marketers

At a glance: Data Locker sends your report data to cloud storage for loading into your BI systems. You can select between different storage destinations from an AppsFlyer-owned bucket in AWS, storage owned by you in AWS and GCS, and Snowflake. Data Locker supports multiple destinations. That means you can send all data to multiple destinations, segregate data by destination, or a combination of both. For each destination select Parquet or CSV format. 

Data Locker

6133DataLockerForAdvertisers.png

Related reading:

In Data Locker select your apps, media sources, events, and reports to include in the data AppsFlyer delivers to your selected cloud storage options. Then, load data programmatically from the storage into your systems. 

Data Locker—features

Feature Description
Storage options (cloud)

Data Locker can send your data to any of the following cloud service providers:

You can set more than 1 destination in Data Locker. This means that you can send all or some of your data to multiple destinations.

Examples

  • Segregate data by report type. Send raw data to GCS and aggregate data to Snowflake.
  • Segregate data by app and send the data per app group to different buckets. 
Multi-app 

Send data of 1, more, or all apps in your account. When you add apps to the account, they can be automatically included. 

Availability window

14 days

Data segregation

Available data segregation options

  • [Default] Unified: Data of all apps combined. The row-level app ID field is used to identify the app in data files. 
  • Segregated by app: Data of each app is in a separate folder. The folder name consists of the app ID. 
Data format options
  • CSV
  • Parquet
Data freshness

Freshness depends on the report type 

  • Hourly: Data generated continuously; for example, installs and in-app event data are written within hours of the event arriving in AppsFlyer. 
  • Daily: Reports like uninstalls, are generated daily and are ready on the following day. 
  • Versioned: If the same report is generated multiple times for the same time period a versioning mechanism is in place. 
Big query and Google Data Studio

If you write your data to GCS storage, Big Query can directly load your Data Locker files without any intermediate processing. You can use other tools over Big Query, like Google Data Studio, to visualize your data. 

Reports available via Data Locker

Data storage architecture

Overview

The structure of your data in storage depends if the data is sent to cloud storage or a data warehouse. The folder structure described here applies to storage (buckets). In the case of data warehouse storage (Snowflake) consider that the reference to folders applies to views. See Snowflake

Data is written to your selected storage option. In the case of cloud storage, the storage is owned by AppsFlyer on AWS or owned by you on AWS or GCS. You can switch storage options at any time or send some or all of your data to multiple storage options. 

Data in the storage is organized in a hierarchical folder structure, according to report type, date, and time. The following figure contains an example of this structure:

DLFolderOVerview.png

Data of a given report is contained in the hour (h) folders associated with that report:

  • The number of hour folders depends on the report data freshness (hourly, daily or versioned).
  • Data is provided in GZ compressed files having Parquet or CSV format.
  • Data files consist of columns (fields). 
  • The schema (field) structure of the user journey reports is identical and depends on the fields selected by you. Other reports each have their own explicit fields, AKA schemaless reports. See Data Locker marketer reports for the reports available and links to the report specifications. 

Folder structure

Folder Description 
Subscription ID

DataLockerFolders.png

  • The top-level folder in the bucket depends on the storage owner and provider. In general, the top-level folder is your Subscription ID but in some cases, for example, if you use Cyberduck the ID is set in the bookmark and doesn't display in the folder structure. 
  • The data-locker-hourly folder contains the report topics. Folders above this level depend on bucket ownership and cloud service provider.

 Examples of folder structure based on bucket owner and cloud provider

  • AppsFlyer bucket: <af-ext-reports>/<unique_identifier>/<data-locker-hourly>
  • Your AWS bucket: <af-datalocker-your bucket prefix>/<generated-home-folder><subscription-id>
  • Your GCS bucket: <your bucket name>/<generated-home-folder>/<subscription-id>
Topic (t) Report type relates to the subject matter of the report. 
Date (dt)

This is the related data date. In the case of raw data, it means the date the event occurred. In the case of aggregated data, the reporting date itself. 

Time (h or version)

Date folders are divided into hourly (h) or version folders depending on the report type. 

Hourly folders

The h folders relate to the time the data was received by AppsFlyer. For example, install events received between 14:00-15:00 UTC are written to the h=14 file. Note! There is a delay, of about 1-3 hours, between the time the data arrives in AppsFlyer until the h folder is written to Data Locker. For example, the h=14 folder is written 1 hour later at 15:00 UTC. 

Hourly folder characteristics:

  • There are 24 h folders numbered 0–23. For example, h=0, h=1, and so on. 
  • A late folder, h=late, contains events of the preceding day arriving after midnight. Meaning events arriving during 00:00–02:00 UTC of the following day. For example, if a user installs an app on Monday 08:00 UTC and the event arrives on Tuesday 01:00 UTC, the event is written to Monday's late folder. 
  • Data arriving after 02:00 UTC is written to the folder of the actual arrival date and time. 
  • Ensure that data in the h=late folder is consumed. It isn't contained in any other folder. 
  • _temporary folder: In some cases, we generate a temporary folder within an h folder. Disregard temporary folders and subfolders. Example: /t=impressions/dt=2021-04-11/h=18/_temporary.
  • Note:
    • Raw data reports having a daily data freshness are stored in the h=23 folder. The uninstall report is usually in the h=2 folder but can be in any folder. 
    • Cohort and Incrementality reports are stored directly in the dt folder. 
    • Versioned reports adhere to a different convention described in this section. 

Hourly report considerations for apps that don't use UTC time.

To make sure that you get all the data for a given calendar day you must consume the folders according to the day defined by the app timezone as detailed: 

  • Eastern hemisphere timezone: To get all the data of a given calendar date you must consume folders according to UTC time and date. Example: Your app timezone is UTC+10 (Sydney, Australia). To get all the hourly data related to tuesday (Sydney) you must consume the following folders: Monday h=14–23 and late, Tuesday h=0–13 and 14-15 Why must you consume Tuesday h=14-15? Some data can arrive late. So the h=14–15 folders can contain late-arriving events. You must filter event_time to align with the app calendar day relative to UTC. 
  • Western hemisphere timezone: To get all the data of a given calendar date you must consume folders according to UTC time and date. Example: Your app timezone is UTC- 7 (Los Angeles). To get all the hourly data related to Tuesday (Los Angeles) you must consume the following folders: Tuesday h=7–23 and late, Wednesday h=0–6 and 7-8. Why must you consume Wednesday h=7-8? Some data can arrive late. So the h=7–8 folders can contain late-arriving events. You must filter event_time to align with the app calendar day relative to UTC.

Version folders

Some reports have a versioned option. This means that the most updated data for a given day is provided multiple times. Because data can continue to update due to late-arriving data or more accurate data the same report has multiple versions where the most recent version is the most accurate. 

The reports for a given day are contained in the versions folder of that day. Each version is contained in a separate folder whose name is set using an Epoch timestamp that uniquely identifies the report. 

Your data import processes must consider that data can be written retroactively. For example, on January 14, data can be written to the Jan 1 folder. If the bucket is owned by you, consider using cloud service notification to trigger your import process (AWS | GCS)

App segregation

Data is provided in unified data files containing the data of all apps selected or segregated into folders by app. The segregation is within the h folder as described in the table that follows.
Segregation type Description 
[Default] Unified

Data for all apps are provided in unified data files. When consuming the data, use the row-level app_id field to distinguish between apps.

Example of data files are in the h=2 folder

UnifiedByApp.png

The data file naming convention is unique_id.gz.

  • Your data loading process must: 
    • Load data after the _SUCCESS flag is set.
    • Load all files in the folder having a .gz extension. Don't build your import process using part numbering logic. 
Segregated by app

The folder contains subfolders per app. Data files for a given app are contained within the app folder. In the figure that follows, the h=19 folder contains app folders. Each app folder contains the associated data files. Note! The data files don't contain the app_id you must determine the app_id using the folder. 

DLSegregateByApp.png

In each app folder the naming convention is unique_id.gz: 

  • Your data loading process must: 
    • Load data after the _SUCCESS flag is set.
    • Load all files in the folder having a .gz extension. Don't build your import process using part numbering logic. 

Limitation: This option is not available for Peopled-Based Attribution reports.

Data files

Content Details
Completion flag

The last file (completion) flag is set when all the data for a given h folder has been written. 

  • Don't read data in a folder before verifying that the _SUCCESS flag exists.

  • The _SUCCESS flag is set even in cases where there is no data to write to a given folder and the folder is empty. 

  • Note! In the segregation by app option, the flag is set in the h folder and not the individual app folders. See the figures in the previous section. 
File types
  • Part files are zipped using GZ.
  • After unzipping, the data files are in Parquet or CSV format according to your settings.
Column sequence (CSV files) 

In the case of CSV files, the sequence of fields in reports is always the same. When we add new fields these are added to the right of the existing fields. 

In this regard: 

  • The column structure of user journey reports is identical. This means you can have similar data loading procedures for different report types. You select the fields contained in the reports. The field meaning is detailed in the raw data dictionary
  • Reports having an FF notation in the report availability section don't adhere to the common column structure. 
Field population considerations

Blank or empty fields: Some fields are populated with null or are empty. This means that in the context of a given report there is no data to report. Typically null means this field is not populated in the context of a given report and app type. Blank "" means the field is relevant in its context but no data was found to populate it with. 

In the case of the restricted media source, the content of restricted fields is set to null. 

Overall regard null and blank as one and the same thing; there is no data available. 

Time zone and currency

App-specific time zone and currency settings have no effect on data written to Data Locker. The following apply: 

  • Time zone: Date and hour data are in UTC.
  • Currency: The field event_revenue_usd is in USD.

Values with commas: These commas are contained between double quotes `"`, for example, `"iPhone6,1"`.

Data files depend on segregation type

Storage options

 Caution!

If you are using the marketer-owned storage option: 

  • Verify that you comply with data privacy regulations like GDPR and ad network/SRN data retention policies.
  • Don't use the marketer-owned storage solution to send data to third parties. 
  • Data is written to a storage owner of your choice as follows:
    • AppsFlyer storage
    • Customer storage—AWS or GCS
  • You can change the storage selection at any time.
  • If you change the storage, the following happens:
    • We start writing to the newly selected storage within one hour.
    • We continue writing to the existing storage during a transition period of 7 days. The transition period expiry time displays in the user interface. Use the transition period to update your data loading processes. You can restart the transition period or revert to the AppsFlyer bucket if needed. 
    • Changing storage: You can migrate from one storage option to another by using the multi-storage option and sending data to multiple destinations simultaneously. Once you have completed the migration and testing, delete the storage option you no longer need. 
  AppsFlyer-owned storage (AWS)  Marketer-owned storage (GCS, AWS, Snowflake)
Bucket name Set by AppsFlyer
  • GCS: No restriction
  • AWS: Set by you. Must have the prefix af-datalocker-.

Example: af-datalocker-your-bucket-name

Storage ownership AppsFlyer Marketer
Storage platform AWS AWS, GCS, Snowflake
Credentials to access data by you Available in the Data Locker user interface to your AppsFlyer account admins Not known to AppsFlyer. Use credentials provided by the cloud provider.
Data retention Data is deleted after 14 days Marketer responsibility
Data deletion requests AppsFlyer responsibility Marketer responsibility
Security AppsFlyer controls the storage. The customer has read access.

The marketer controls the storage.

  • AWS: AppsFlyer requires GetObject, ListBucket, DeleteObject, PutObject permission to the bucket. The bucket should be dedicated to AppsFlyer use. Don't use it for other purposes.
  • GCS: See GCS configuration article.
Storage capacity Managed by AppsFlyer Managed by the marketer
Access control using VPC endpoints with bucket policies Not Applicable [Optional] In AWS, if you implement VPC endpoint security at the bucket level, you must allowlist AppsFlyer servers. 

Notice to security officers in the case of customer-controlled storage

Consider:

  • The bucket or destination is for the sole use of AppsFlyer. There should be no other entity writing to a given destination.
  • You can delete data in the destination 25 hours after we write the data.
  • Data written to the destination is a copy of data already in our servers. The data continues to be in our servers in accordance with our retention policy. 
  • For technical reasons, we sometimes delete and rewrite the data. For this reason, we need delete and list permissions. Neither permissions are a security risk for you. In the case of list, we are the sole entity writing to the bucket. In the case of delete, we are able to regenerate the data.
  • For additional information, you can contact our security team via hello@appsflyer.com or your CSM.  

Multiple-connections principles (more than one destination)

In Data Locker you can send some or all of your data to more than one destination (defined in the connection settings). For example, you can send App A data to AWS, and App B data to GCS.

Each connection consists of a complete set of Data Locker settings, including a destination. Connection settings are independent of one another.

In managing your connections, consider:

  • In Data Locker settings, connections are shown in tabs. Each connection has its own settings tab from which you can manage the connection. The default tab is “Data Locker.”
  • To create a new connection:
    1. Click Add connection.
    2. Provide a name for the connection and choose the storage type.
    3. Click Save.
      Once saved, the connection displays next to the default “Data Locker” tab. The icon of each tab represents the storage type.
  • To see connection details, duplicate a connection, or delete a connection, click ⋮ (options).

Procedures

Set up Data Locker

Use this procedure to set up Data Locker. Any changes to Data Locker settings take up to 3 hours to take effect. 

Prerequisites

To set up marketer-owned storage:

If you are setting up Data Locker using marketer-owned storage, meaning a bucket owned by you, complete one or more of the following procedures now. 

Note! If you don't have a Data Locker subscription and you access Cohorts analytics or SKAN data, you must complete a marketer-owned storage procedure. 

AppsFlyerAdmin_us-en.png To set up Data Locker:

  1. An admin needs to perform the setup. 
  2. In AppsFlyer, go to Integration > Data Locker. 
  3. [Optional] If you already have an active Data Locker destination and want to add a destination, click +. 
  4. Select a cloud service data destination. Do one of the following:
    • Select AppsFlyer AWS bucket (option available to Data Locker subscribers only.) Continue to step 4. 
    • Select Your AWS bucket.
      1. Enter your AWS bucket name. Don't enter the prefix af-datalocker-.
      2. Click Test.
      3. Verify that an error message indicating that the bucket path is invalid isn't displayed.
    • Select Your GCS bucket, then enter your GCS bucket name, and finally click test. 
  5. Select folder structure (data segregation):
    • [Default] Unified
    • Segregated by app
  6. Select file format:
    • [Default] Parquet
    • CSV
  7. Select the required apps. Select all to automatically include apps added in the future. 
  8. Click Apply
  9. [optional] Media Sources: Select one or more Media Sources to include in reports.
    • Default=All. This means that media sources added in the future are automatically added.
  10. Select the required report types. You must select at least 1. 
  11. [optional] In-app events: Select the in-app events to include. If you have more than 100 in-app event types, you can't search for them. Enter their names exactly to select them. 
    • Default=All. This means that in-app events added in the future are automatically added.
  12. Click Apply
  13. [Optional] Fields: Select the fields to include in the reports. Note: Sometimes we make additional fields available. Take this into account in your data import process.
  14. Click Save Configuration. One of the following occurs:
    • If you selected AppsFlyer AWS bucket:
      • A dedicated AWS bucket is created. The bucket credentials display.
      • The bucket is accessible using the credentials. The credentials provide you with read-only access to the bucket.
    • If you selected a Customer bucket: Data will be written to your bucket within 3 hours. 

Reset credentials

An admin can reset the AppsFlyer bucket credentials at any time. Note! If you reset the credentials, you must update your data import scripts with the updated credentials.

AppsFlyerAdmin_us-en.png To reset the credentials of AppsFlyer owned storage:

  1. In AppsFlyer, go to Integration > Data Locker. 
  2. Select the AppsFlyer-owned destination.
  3. In the Credentials section, click Reset credentials.
    A confirmation window displays.
  4. Click Reset.
  5. Wait (about 20 seconds) until the Credentials successfully reset message displays.
    The updated credentials are available.

Additional information

Traits and Limitations

Traits
Trait Remarks 
Ad networks Not for use by ad networks
Agencies Not for use by agencies
App-specific time zone Not Applicable. Data Locker folders are divided into hours using UTC. The actual events contain times in UTC. Convert the times to any other time zone as needed. Irrespective of your app time-zone the delay from event occurrence until it is recorded in Data Locker remains the same.
App-specific currency  Not supported
Size limitations Not applicable
Data freshness Data is updated according to the specific report data freshness detailed in this article
Historical data Not supported. If you need historical data, some reports, but not all, are available via Pull API.
User access Only account users with required permissions can configure Data Locker. 
Single app/multiple app Multi-app support. Data Locker is at the account level

Troubleshooting

  • Symptom: Unable to retrieve data using AWS CLI
  • Error message: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
  • Cause: The AWS credentials being used are not the correct credentials for the AppsFlyer bucket. This can be caused by having multiple or invalid credentials on your machine. 
  • Solution:
    1. Use a different method, like Cyberduck to access the bucket, meaning not the CLI. Do this to verify that the credentials you are using are working. If you are able to connect using Cyberduck, this indicates an issue with the credentials cache. 
    2. Refresh the AWS credentials cache.
      Screenshot from AWS

      mceclip0.png

AWS data retrieval

Use your preferred AWS data retrieval tool, AWS CLI, or one of the tools described in the sections that follow. Note! The exact instructions are suitable for AppsFlyer owned buckets. Adjust the instructions as needed if you are connecting to your bucket. 

AWS CLI

Before you begin:

  • Install the AWS CLI on your computer.
  • In AppsFlyer, go to Data Locker, and retrieve the information contained in the credentials panel.

To use AWS CLI:

  1. Open the terminal. To do so in Windows, <Windows>+<R>, click OK.
    The command line window opens.
  2. Enter aws configure.
  3. Enter the AWS Access Key as it appears in the credentials panel.
  4. Enter your AWS Secret Key as it appears in the credentials panel.
  5. Enter eu-west-1.
  6. Press Enter (None).

Use the CLI commands that follow as needed.

In the following commands, the value of {home-folder} can be found

To list folders in your bucket:


aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/

Listing files and folders

There are three types of folders in your Data Locker bucket:

  • Report Type t=
  • Date dt=
  • Hour h=

To list all the reports of a specific report type:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/

To list all the reports of a specific report type for a specific day:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/dt=2019-01-17

To list all the reports of a specific report, in a specific hour of a specific day:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/dt=2019-01-17/h=23

To download files for a specific date:


aws s3 cp s3://af-ext-reports/<home-folder>/data-locker-hourly/t=installs/dt=2020-08-01/h=9/part-00000.gz ~/Downloads/

Cyberduck

Before you begin:

  • Install the Cyberduck client.
  • In AppsFlyer, go to Data Locker and retrieve the information contained in the credentials panel.

To configure Cyberduck:

  1. In Cyberduck, click Action.
  2. Select New Bookmark. The window opens.
  3. In the first field (marked [1] in the screenshot below) select Amazon S3.

    DataDuckSmall2.png

  4. Complete the fields as follows:
    • Nickname: Free text
    • Server: s3.amazonaws.com
    • Access Key ID: Copy the AWS Access Key as it appears in the credentials panel in AppsFlyer
    • Secret Access Key: Copy the Bucket Secret key as it appears in the credentials panel in AppsFlyer.
    • Path: {Bucket Name}/{Home Folder} For example: af-ext-reports/1234-abc-ffffffff
  5. Close the window. To do so, click the X in the upper-right corner of the window.
  6. Select the connection.
    The data directories are displayed.

Amazon S3 browser

Before you begin:

  • Install the Amazon S3 Browser.
  • In AppsFlyer, go to Data Locker and retrieve the information contained in the credentials panel.

To configure the Amazon S3 Browser:

  1. In the S3 browser, Click Accounts > Add New Account.
    The Add New Account window opens.

    mceclip0.png

  2. Complete the fields as follows:
    • Account Name: free text. 
    • Access Key ID: copy the AWS Access Key as it appears in the credentials panel. 
    • Secret Access Key: copy the Bucket Secret key as it appears in the credentials panel.
    • Select Encrypt Access Keys with a password and enter a password. Make a note of this password.
    • Select Use secure transfer. 
  3.  Click Save changes.
  4. Click Buckets > Add External Bucket.
    The Add External Bucket window opens.

    mceclip2.png

  5. Enter the Bucket name. The Bucket name has the following format: {Bucket Name}/{Home Folder}. The values needed for bucket name and home folder appear in the credentials window. 
  6. Click Add External bucket.
    The bucket is created and displays in the left panel of the window.
    You can now access the Data Locker files. 
Was this article helpful?