Data Locker for Advertisers

At a glance: Data Locker writes your report data to cloud storage for loading into your BI systems. Different storage options allow you to select between an AppsFlyer-owned bucket on AWS or a bucket owned by you on AWS or GCS. Data provided in either Parquet or CSV format. 

6133DataLockerForAdvertisers.png

What's new in Data Locker

Starting September 3, 2021, you can select between CSV or Parquet data formats. If you currently get your data in a CSV format and want to test Parquet format, contact your CSM or hello@AppsFlyer.com. We'll deliver your data in both formats; during a limited test period. 

Related reading: Selecting the right raw data delivery tool 

Data Locker

In Data Locker select your apps, media sources, events, and reports to include in data written to cloud storage. Load data programmatically from the storage into your BI systems. 

Data Locker—features

Feature Description
Storage options (cloud)

Several storage options are available. You can switch between options at any time. Option differences relate to the cloud service provider and bucket ownership. Options available: 

Multi-app 

Send data of 1, more, or all apps in your account. When you add apps to the account, they can be automatically included. 

Data segregation

 

Available data segregation options

  • [Default] Unified: Data of all apps combined. The row-level app ID field is used to identify the app in data files. 
  • Segregated by app: Data of each app is in a separate folder. The folder name consists of the app ID. 
Data format options
  • CSV
  • Parquet
Data freshness

Freshness depends on the report type 

  • Hourly: Data generated continuously; for example, installs and in-app event data are written within hours of the event arriving in AppsFlyer. 
  • Daily: Some reports, for example, uninstalls, are generated daily and are ready on the following day. 
Reports unique to Data Locker
  • Unconverted data: Click and impression data of UA and retargeting campaigns. About clicks and impressions
  • SKAN raw data is available in storage owned by you without the need for a Data Locker subscription. 
Example data files

Clicks, Installs, In-app events 

Big query and Google Data Studio

If you write your data to GCS storage, Big Query can directly load your Data Locker files without any intermediate processing. You can use other tools over Big Query, like Google Data Studio, to visualize your data. 

Reports—user journey

For a description of the report types, see user journey reports.

 
Category Report type (topic) Data freshness* Organic/Non-organic Unique to Data Locker
User acquisition Clicks  6-hour lag N/A
Retargeting Clicks 6-hour lag N/A
User acquisition Impressions 6-hour lag N/A
Retargeting Impressions 6-hour lag N/A
User acquisition Installs 6-hour lag Both  
User acquisition In-app events  6-hour lag Both  
User acquisition Attributed ad revenue Daily+2 Non-organic  
User acquisition Organic ad revenue Daily+2 Organic  
Retargeting Retargeting ad revenue Daily+2 Non-organic  
Retargeting Conversions 6-hour lag Non-organic  
Retargeting In-app events 6-hour lag Non-organic  
Retargeting Sessions 6-hour lag Both
User acquisition Sessions 6-hour lag Both
User acquisition Uninstalls Daily-uninstall Non-organic  
User acquisition Organic uninstalls Daily-uninstall Organic  
Reinstalls Reinstalls 6-hour lag Non-organic  
Reinstalls Organic reinstalls 6-hour lag Organic  

6-hour lag:

  • Data is separated into arrival hour folders.
  • That is the hour that the event was made available to Data Locker.
  • Lag time isn't related to the app-specific time zone. 

Daily-uninstall:

  • Uninstall data are prepared daily. 
  • Usually available by 10:00-12:00 UTC.
  • Most often written to the h=2 folder. Meaning the h=2 folder contains uninstalls reported on the previous day. However, the data may be written to a later folder; therefore your import process must read the data of all folders in the uninstall folder. Meaning h=1–24 and h=late. For example, the report for data generated during Monday is in the Tuesday h=2 folder. The data is available after 10:00 UTC on Tuesday. 

Daily+2: Ad revenue data is available after 2 days, meaning that data generated during Monday becomes available in the Monday h=23 folder after 06:00 UTC on Wednesday.

Reports—application

Protect360 reports
Report type (topic) Data freshness*
Blocked installs 6-hour lag
Blocked in-app events 6-hour lag
Blocked clicks 6-hour lag
[AG*] Post-attribution installs Daily
SKAN [Doesn't require a Data Locker subscription if you send them to your own bucket]
Data freshness: Daily 
Report type (topic)
[FF*] Postbacks
[FF*] Installs
[FF*] Redownloads
[FF*] In-app events
People-Based Attribution reports
Data freshness: Daily
Report type (topic)
[FF*] Website visits
[FF*] Website events
[FF*] Website-assisted installs
[FF*] Conversion Paths
 * Key to abbreviations

[FF] Report fields are fixed by AppsFlyer. They are not related to the fields selected for inclusion in reports.

[AG] Agency transparency not supported.

6-hour lag:

  • Data is separated into arrival hour folders.
  • That is the hour that the event was made available to Data Locker.
  • Lag time isn't related to the app-specific time zone.

Daily:

  • Reports are written to the h=23 folder.
  • These reports are typically available by 10:00-12:00 UTC in the h=23 folder of the preceding day.
  • For example, the report for data generated during Monday is in the Monday h=23 folder. The data is available after 10:00 UTC on Tuesday. 

Data storage architecture

Overview

Data is written to your selected storage option. The storage is owned either by AppsFlyer on AWS or owned by you on AWS or GCS. You can switch from one storage option to another at any time. The change occurs within hours. 

Data in the storage is organized in a hierarchical folder structure, according to report type, date, and time. The following figure contains an example of this structure:

DLFolderOVerview.png

Data of a given report is contained in the hour (h) folders associated with that report:

  • The number of hour folders depends on the report data freshness (hourly or daily).
  • Data is provided in GZ compressed files having Parquet or CSV format.
  • Data files consist of columns (fields). 
  • The column structure of core measurement reports is identical. This means you can have similar data loading procedures for different report types. The actual fields (columns) contained in the data provided are selected by you. 
  • Reports designated with FF have their own unique column structure which can't be set by you.

Folder structure

Folder Description 
data-locker-hourly

DLHourly.png

  • The top-level folder in the bucket depends on the storage owner and provider.
  • The data-locker-hourly folder contains the report topics. Folders above this level depend on bucket ownership and cloud service provider.

 Examples of folder structure based on bucket owner and cloud provider

  • AppsFlyer bucket: <af-ext-reports>/<unique_identifier>/<data-locker-hourly>
  • Your AWS bucket: <af-datalocker-your folder name>/<data-locker-hourly>
  • Your GCS bucket: <data-locker-hourly>
t (topic) Report type relates to the subject matter of the report. 
dt (date)

This is the related data date. In most cases, this means the date the event occurred. 

h (hour)

The h folders relate to the time the data was received by AppsFlyer. For example, install events received between 14:00-15:00 UTC are written to the h=14 file. Note! There is a lag, of about 6 hours, between the time the data arrives in AppsFlyer until the h folder is written to Data Locker. For example, the h=14 folder is written six hours later at 21:00 UTC. 

Folder characteristics:

  • There are 24 h folders numbered 0–23. For example, h=0, h=1, and so on. 
  • A late folder, h=late, contains events of the preceding day arriving after midnight. Meaning events arriving during 00:00–02:00 UTC of the following day. For example, if a user installs an app on Monday 08:00 UTC and the event arrives on Tuesday 01:00 UTC, the event is written to Monday's late folder. 
  • Data arriving after 02:00 UTC is written to the folder of the actual arrival date and time. 
  • Ensure that data in the h=late folder is consumed. It isn't contained in any other folder. 
  • _temporary folder: In some cases, we generate a temporary folder within an h folder. Disregard temporary folders and subfolders. Example: /t=impressions/dt=2021-04-11/h=18/_temporary.

Hourly report considerations for apps that don't use UTC time.

To make sure that you get all the data for a given calendar day you must consume the folders according to the day defined by the app timezone as detailed: 

  • Eastern hemisphere timezone: To get all the data of a given calendar date you must consume folders according to UTC time and date. Example: Your app timezone is UTC+10 (Sydney, Australia). To get all the hourly data related to tuesday (Sydney) you must consume the following folders: Monday h=14–23 and late, Tuesday h=0–13 and 14-15 Why must you consume Tuesday h=14-15? Some data can arrive late. So the h=14–15 folders can contain late-arriving events. You must filter event_time to align with the app calendar day relative to UTC. 
  • Western hemisphere timezone: To get all the data of a given calendar date you must consume folders according to UTC time and date. Example: Your app timezone is UTC- 7 (Los Angeles). To get all the hourly data related to Tuesday (Los Angeles) you must consume the following folders: Tuesday h=7–23 and late, Wednesday h=0–6 and 7-8. Why must you consume Wednesday h=7-8? Some data can arrive late. So the h=7–8 folders can contain late-arriving events. You must filter event_time to align with the app calendar day relative to UTC.

App segregation

Data is provided in unified data files containing the data of all apps selected or segregated into folders by app. The segregation is within the h folder as described in the table that follows.
Segregation type Description 
[Default] Unified

Data for all apps are provided in unified data files. When consuming the data, use the row-level app_id field to distinguish between apps.

Example of data files are in the h=2 folder

UnifiedByApp.png

The data file naming convention is part-nnnnn.gz where: 

  • nnnnn is a part number. For example, part-00000, part-00001, part-00002, and so on. This naming structure may change in the future. 
  • Part numbers aren't necessarily consecutive.
  • Your data loading process must: 
    • Load data after the _SUCCESS flag is set.
    • Load all files in the folder having a .gz extension. Don't build your import process using part numbering logic. 
Segregated by app

The folder contains subfolders per app. Data files for a given app are contained within the app folder. In the figure that follows, the h=19 folder contains app folders. Each app folder contains the associated data files.

DLSegregateByApp.png

In each app folder the naming convention is: part-nnnnn-string.csv.gz: 

  • nnnnn is a part number. For example, part-00000, part-00001, part-00002, and so on. This naming structure may change in the future. 
  • Part numbers aren't necessarily consecutive.
  • Your data loading process must: 
    • Load data after the _SUCCESS flag is set.
    • Load all files in the folder having a .gz extension. Don't build your import process using part numbering logic. 

Limitation: This option is not available for Peopled-Based Attribution reports.

Data files

Content Unified  Segregated by app 
Completion flag

The last file (completion) flag is set when all the data for a given h folder has been written. 

  • Don't read data in a folder before verifying that the _SUCCESS flag exists.

  • The _SUCCESS flag is set even in cases where no data to write to a given folder and the folder is empty. 

  • Note! In the segregation by app option, the flag is set in the h folder and not the individual app folders. See the figures in the previous section. 
File types
  • Part files are zipped using GZ. After unzipping.
  • The data files are in Parquet or CSV format according to your settings.
  • Part files are zipped using GZ. After unzipping.
  • The data files are in Parquet or CSV format according to your settings.
Column sequence (CSV files) 

In the case of CSV files, the sequence of fields in reports is always the same. When we add new fields these are added to the right of the existing fields. 

In this regard: 

  • The column structure of user journey reports is identical. This means you can have similar data loading procedures for different report types. You select the fields contained in the reports.  
  • Reports having an FF notation in the report availability section don't adhere to the common column structure. 
  • The field meaning is detailed in the raw data dictionary
Field population considerations

Blank or empty fields: Some fields are populated with null or are empty. This means that in the context of a given report there is no data to report. Typically null means this field is not populated in the context of a given report and app type. Blank "" means the field is relevant in its context but no data was found to populate it with. 

In the case of the restricted media source, the content of restricted fields is set to null. 

Overall regard null and blank as one and the same thing; there is no data available. 

Time zone and currency

App-specific time zone and currency settings have no effect on data written to Data Locker. The following apply: 

  • Time zone: Date and hour data are in UTC.
  • Currency: The field event_revenue_usd is in USD.

Values with commas: These commas are contained between double quotes `"`, for example, `"iPhone6,1"`.

Data files depend on segregation type

Storage options

 Caution!

If you are using the advertiser-owned storage option: 

  • Verify that you comply with data privacy regulations like GDPR and ad network/SRN data retention policies.
  • Don't use the advertiser-owned storage solution to send data to third parties. 
  • Data is written to a storage owner of your choice as follows:
    • AppsFlyer storage
    • Customer storage—AWS or GCS
  • You can change the storage selection at any time.
  • If you change the storage, the following happens:
    • We start writing to the newly selected storage within one hour.
    • We continue writing to the existing storage during a transition period of 7 days. The transition period expiry time displays in the user interface. Use the transition period to update your data loading processes. You can restart the transition period or revert to the AppsFlyer bucket if needed. 
    • Changing buckets: If you change storage, data is sent to both for a transition period of 7 days, allowing you to align your data consumption process.  
  AppsFlyer-owned storage (AWS)  Advertiser-owned storage (GCS or AWS)
Bucket name Set by AppsFlyer
  • GCS: No restriction
  • AWS: Set by you. Must have the prefix af-datalocker-.

Example: af-datalocker-your-bucket-name

Storage ownership AppsFlyer Advertiser
Storage platform AWS AWS or GCS
Credentials to access data by you Available in the Data Locker user interface to the Admin Not known to AppsFlyer. Use credentials provided by the storage provider.
Data retention Data is deleted after 30 days Advertiser responsibility
Data deletion requests AppsFlyer responsibility Advertiser responsibility
Security AppsFlyer controls the storage. The customer has read access.

The advertiser controls the storage.

  • AWS: AppsFlyer requires GetObject, ListBucket, DeleteObject, PutObject permission to the bucket. The bucket should be dedicated to AppsFlyer use. Don't use it for other purposes.
  • GCS: See GCS configuration article.
Storage capacity Managed by AppsFlyer Managed by the advertiser
Access control using VPC endpoints with bucket policies Not Applicable [Optional] In AWS, if you implement VPC endpoint security at the bucket level, you must allowlist AppsFlyer servers. 
SKAN reports Require a Data Locker subscription Available if you have a raw data subscription. Meaning, there is no need for a Data Locker subscription.

Notice to security officers in the case of customer-controlled storage

Consider:

  • The bucket is for the sole use of AppsFlyer. There should be no other entity writing to the bucket.
  • You can delete data in the bucket 25 hours after we write the data.
  • Data written to the bucket is a copy of data already in our servers. The data continues to be in our servers in accordance with our retention policy. 
  • For technical reasons, we sometimes delete and rewrite the data. For this reason, we need delete and list permissions. Neither permissions are a security risk for you. In the case of list, we are the sole entity writing to the bucket. In the case of delete, we are able to regenerate the data. 
  • For addtional information, you can contact our security team via hello@appsflyer.com or your CSM.  

Procedures

Set up Data Locker

Use this procedure to set up Data Locker. Any changes to Data Locker settings take up to 3 hours to take effect. 

Prerequisite for setting up advertiser-owned storage:

If you are setting up Data Locker using advertiser-owned storage, meaning a bucket owned by you, complete one of the following procedures now. 

Note! If you don't have a Data Locker subscription and you only access SKAdNetwork data, you must complete an advertiser-owned storage procedure. 

AppsFlyerAdmin_us-en.png To set up Data Locker:

  1. The Admin needs to perform the setup. 
  2. In AppsFlyer, go to Integration > Data Locker. 
  3. Select a cloud service data destination. Do one of the following:
    • Select AppsFlyer AWS bucket. Continue to step 4. 
    • Select Your AWS bucket.
      1. Enter your AWS bucket name. Don't enter the prefix af-datalocker-.
      2. Click Test.
      3. Verify that an error message indicating that the bucket path is invalid isn't displayed.
    • Select Your GCS bucket, then enter your GCS bucket name, and finally click test. 
  4. Select folder structure (data segregation):
    • [Default] Unified.
    • Segregated by app.
  5. Select file format:
    • [Default] Parquet.
    • CSV
  6. Select the required apps. Select all to automatically include apps added in the future. 
  7. Click Apply
  8. [optional] Media Sources: Select one or more Media Sources to include in reports.
    • Default=All. This means that media sources added in the future are automatically added.
  9. Select the required report types. You must select at least 1. 
  10. [optional] In-app events: Select the in-app events to include. If you have more than 100 in-app event types, you can't search for them. Enter their names exactly to select them. 
    • Default=All. This means that in-app events added in the future are automatically added.
  11. Click Apply
  12. [Optional] Fields: Select the fields to include in the reports. Note: Sometimes we make additional fields avaialble. Take this into account in your data import process.
  13. Click Save Configuration. One of the following occurs:
    • If you selected AppsFlyer AWS bucket:
      • A dedicated AWS bucket is created. The bucket credentials display.
      • The bucket is accessible using the credentials. The credentials provide you with read-only access to the bucket.
    • If you selected a Customer bucket: Data will be written to your bucket within 3 hours. 

Reset credentials

The Admin can reset the AppsFlyer bucket credentials at any time. Note! If you reset the credentials, you must update your data import scripts with the updated credentials.

AppsFlyerAdmin_us-en.png To reset the credentials:

  1. In AppsFlyer, go to Integration > Data Locker. 
  2. In the Credentials section, click Reset credentials.
    A confirmation window displays.
  3. Click Reset.
  4. Wait (about 20 seconds) until the Credentials successfully reset message displays.
    The updated credentials are available.

Additional information

Traits and Limitations

Traits
Trait Remarks 
Ad networks Not for use by ad networks
Agencies Not for use by agencies
App-specific time zone Not Applicable. Data Locker folders are divided into hours using UTC. The actual events contain times in UTC. Convert the times to any other time zone as needed. Irrespective of your app time-zone the lag from event occurrence until it is recorded in Data Locker remains the same; that is 6 hours. 
App-specific currency  Not supported
Size limitations Not applicable
Data freshness Data is updated according to the specific report data freshness detailed in this article. 
Historical data Not supported. Event data is sent after configuring Data Locker. If you need historical data use Pull API. 
Team member access Team members cannot configure Data Locker. 
Single app/multiple app Multi-app support. Data Locker is at the account level

Troubleshooting

  • Symptom: Unable to retrieve data using AWS CLI
  • Error message: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
  • Cause: The AWS credentials being used not the correct credentials for the AppsFlyer bucket. This can be caused by having multiple or invalid credentials on your machine. 
  • Solution:
    1. Use a different method, like Cyberduck to access the bucket, meaning not the CLI. Do this to verify that the credentials you are using are working. If you are able to connect using Cyberduck, this indicates an issue with the credentials cache. 
    2. Refresh the AWS credentials cache.
      Screenshot from AWSmceclip0.png 

AWS data retrieval

Use your preferred AWS data retrieval tool, AWS CLI, or one of the tools described in the sections that follow. Note! The exact instructions are suitable for AppsFlyer owned buckets. Adjust the instructions as needed if you are connecting to your bucket. 

AWS CLI

Before you begin:

  • Install the AWS CLI on your computer.
  • In AppsFlyer, go to Data Locker, and retrieve the information contained in the credentials panel.

To use AWS CLI:

  1. Open the terminal. To do so in Windows, <Windows>+<R>, click OK.
    The command line window opens.
  2. Enter aws configure.
  3. Enter the AWS Access Key as it appears in the credentials panel.
  4. Enter your AWS Secret Key as it appears in the credentials panel.
  5. Enter eu-west-1.
  6. Press Enter (None).

Use the CLI commands that follow as needed.

In the following commands, the value of {home-folder} can be found

To list folders in your bucket:


aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/

Listing files and folders

There are three types of folders in your Data Locker bucket:

  • Report Type t=
  • Date dt=
  • Hour h=

To list all the reports of a specific report type:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/

To list all the reports of a specific report type for a specific day:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/dt=2019-01-17

To list all the reports of a specific report, in a specific hour of a specific day:

aws s3 ls s3://af-ext-reports/{home-folder}/data-locker-hourly/t=installs/dt=2019-01-17/h=23

To download files for a specific date:


aws s3 cp s3://af-ext-reports/<home-folder>/data-locker-hourly/t=installs/dt=2020-08-01/h=9/part-00000.gz ~/Downloads/

Cyberduck

Before you begin:

  • Install the Cyberduck client.
  • In AppsFlyer, go to Data Locker and retrieve the information contained in the credentials panel.

To configure Cyberduck:

  1. In Cyberduck, click Action.
  2. Select New Bookmark. The window opens.
  3. In the first field, (marked [1] in the screenshot that follows,) select Amazon S3.

    DataDuckSmall2.png

  4. Complete the fields as follows:
    • Nickname: Free text
    • Server: s3.amazonaws.com
    • Access Key ID: Copy the AWS Access Key as it appears in the credentials panel in AppsFlyer
    • Secret Access Key: Copy the Bucket Secret key as it appears in the credentials panel in AppsFlyer.
    • Path: {Bucket Name}/{Home Folder} For example: af-ext-reports/1234-abc-ffffffff
  5. Close the window, to do so, use the X in the upper-right corner of the window.
  6. Select the connection.
    The data directories are displayed.

Amazon S3 browser

Before you begin:

  • Install the Amazon S3 Browser.
  • In AppsFlyer, go to Data Locker and retrieve the information contained in the credentials panel.

To configure the Amazon S3 Browser:

  1. In the S3 browser, Click Accounts > Add New Account.
    The Add New Account window opens.

    mceclip0.png

  2. Complete the fields as follows:
    • Account Name: free text. 
    • Access Key ID: copy the AWS Access Key as it appears in the credentials panel. 
    • Secret Access Key: copy the Bucket Secret key as it appears in the credentials panel.
    • Select Encrypt Access Keys with a password and enter a password. Make a note of this password.
    • Select Use secure transfer. 
  3.  Click Save changes.
  4. Click Buckets > Add External Bucket.
    The Add External Bucket window opens.

    mceclip2.png

  5. Enter the Bucket name. The Bucket name has the following format: {Bucket Name}/{Home Folder}. The values needed for bucket name and home folder appear in the credentials window. 
  6. Click Add External bucket.
    The bucket is created and displays in the left panel of the window.
    You can now access the Data Locker files. 
Was this article helpful?