Data Clean Room—Working with sources

Premium

At a glance: Set up the data sources you upload to the Data Clean Room (DCR) for enrichment with attribution/in-app event data and DCR report creation.

Introduction

Many DCR reports are designed to match attribution data/in-app event data with data from your custom sources. This article contains everything you need to know about setting up custom sources for use with the DCR, including how to:

Source format

File format

Uploaded data source files must meet these name, file format, and location requirements:

  • Must comply with DCR naming requirements
  • CSV or GZIP format. The file underlying GZIP compression must be a CSV file.
  • Number of data source files per data folder:
    • CSV: Maximum of 1
    • GZIP: Maximum of 1 single-part file. Multi-part GZIP files are supported when named as follows: filename_part01.gzip, filename_part02.gzip, etc.

Data format

Data within the source files must meet these requirements:

  • Date and time:
    • Format: yyyy-MM-dd hh:mm:ss
    • Time zone: UTC
  • Numbers: maximum 2 digits following the decimal point
  • String length: maximum of 256 characters
  • Character limitations:
    • For field names (column headers): no spaces or special characters
    • All other data: no limitations (all characters are valid)

Creating a source

The process of creating a source consists of all the steps described below. They are separated into tabs simply for ease of reading.

Follow these steps to create a source:

#1: Name the source

  1. Go to the Sources tab of the Data Clean Room.
  2. Click the + New source button.
    The New source page opens.
  3. Enter the name of the source in the upper-left corner.
    • This can be any unique name that will help you identify the source in the DCR platform. It does not need to match the file name.
    • Important! Ensure that the source name is different from all other sources in your account or you will not be able to save the source.
    • Source name requirements:
      • Length: 2-80 characters
      • Valid characters:
        • letters (A-Z, a-z)
        • numbers (0-9), cannot be the first character of a name; 
      • Invalid characters:
        • spaces
        • all other symbols or special characters

#2: Specify the source location

The source location consists of a cloud storage bucket (known as a connector) and the underlying folder path from which the DCR reads the source file each time it is updated. 

Once you have specified the connector, AppsFlyer can automatically generate the required folders as part of the source creation process.

  • Allowing AppsFlyer to generate the folders makes the process easy. However, you can choose to manually create them instead, according to the instructions detailed here.

If AppsFlyer generates the folders, the only additional information required is the name you want to give the source folder. (This is the top-level folder in which you update the source each time your want to use it for running a new report version.) You can also indicate whether you want the source folder to be created underneath a parent folder often named input.

To specify the source location:

  1. Select the connector in which the source folder will be (or has been) created.
    • If there are no connectors defined in your account, the New connector dialog will open, prompting you to create one.
    • If you have existing connectors but want to use a new one, click the + New connector link.
  2. Enter the source folder name.
    • By default, the displayed source folder name:
      • Is based on the name you gave the source. You can change the folder name to meet your needs, so long as it complies with the DCR naming requirements.
      • Indicates that it will be generated within a parent folder named input. This folder serves as the parent folder for all sources you upload to the DCR.
        • The input folder is not required, and you can remove it or name it something different, so long as it complies with the DCR naming requirements.
        • Although this folder is not required, having an input folder (or an equivalent folder of a different name) is considered best practice. It is even more highly recommended when you are using the same connector both for uploading data files (input) and receiving reports (output).

 Important!

If you manually created the folder path, make sure the connector and path you enter in the Source location section match the path you manually created.

#3: Define the source structure

For all sources that you upload to the DCR for processing, AppsFlyer needs to know how each data field should be used in order to create reports. Defining the source structure consists of loading a prototype source file and categorizing each field (column) as one of the following types:

  • Identifier: Field that identifies a unique app user (examples might include CUID, AppsFlyer ID, etc.)
    • The primary purpose of identifiers in the context of the DCR is to join data sources so that corresponding user-level data can be matched.
  • Dimension: An attribute by which you categorize app users (examples might include geo, install date, campaign, etc.)
  • Metric: Numeric data you have collected with respect to an app user (examples might include revenue, number of app opens, LTV, etc.)
    • A data field identified as a metric can contain only numeric values.

Upload a prototype source file

For purposes of defining the source structure: 

  • You can upload a prototype version of the source from a local file.
    • If you select this option, AppsFlyer always creates the source folder path automatically.

                                                                - or -

  • You can upload a prototype version of the source file directly from its connector.
    • If you select this option, there's one additional choice to make:
      • Allow AppsFlyer to automatically create the source folder structure; or
      • Create the source folder structure manually

To upload your prototype source file, follow the instructions in the relevant tab below:

Local file Connector (automatic creation) Connector (manual creation)
  1. In the Source structure section, click the DCR_load_fields_from_file.png button.
  2. In the window that opens, select Upload a local file.
  3. Specify the CSV or GZIP file you want to upload, then click OK.

Categorize fields

After you load the file, AppsFlyer analyzes the file, and a list of all data fields (columns) is displayed in the Available fields list.

To categorize the fields:

  1. Select one or more fields in the Available fields list on the left and use the buttons in the middle of the screen to categorize them as identifiers, dimensions, or metrics.
    • Once you categorize a field, it is displayed in the relevant category list on the right side of the screen.
    • You can use the search bar to search for fields in the lists.
    • To remove a field from a category it's been assigned to, select it in the relevant category list and use the Remove button to return it to the Available fields list.
  2. Repeat this process until you have categorized each field you want to include in DCR reports.
    • There is no requirement to categorize every field in the Available fields list. However, a field must be categorized in order to use it later in a report.
  3. If you edit the source file before saving the source and want to use fields from the edited file, click the Reload fields link at the bottom of the Available fields list.
    • Note that reloading the source will overwrite the field names in the Available fields list. Any fields that you previously categorized will remain in the Identifiers, Dimensions, or Metrics lists.
    • If a previously categorized field is not found in the reloaded source file, it will still display in the relevant category list, but it will be marked with an error icon.

 Note

If you decide to use additional fields from this source after saving it, you can do so by editing the source structure.

#4: Save the source

To save the source:
  1. [Optional] Click DCR_test_source.png to check for errors in the format or validity of the source fields.
  2. Click Save to save the source.

    The source is created and a confirmation message is displayed.

    • If you uploaded the source from a local file, saving the source triggers the automatic creation of the folder structure, and the displayed confirmation message includes a link to the source folder.

    The new source is displayed in the list of all existing sources in the Sources tab of the Data Clean Room.

Uploading source files to trigger report processing

Each time you want AppsFlyer to process a data source file and run a report based on it, you upload a new version of the file to the source folder, within a series of nested subfolders indicating the date and version number (plus one extra subfolder to let AppsFlyer know where the data is).

AppsFlyer continually scans for new versions of source files for the current date and 3 days prior. A new version of a report is triggered each time a new version of the source files are found (including _SUCCESS files, as further detailed below).

Nested subfolders for each date and version

The structure of nested subfolders is as follows:

  • Within the source folder --> 1 subfolder for each date ("date folder")
    • Format: dt=yyyy-mm-dd/
    • Example: dt=2022-12-15/
  • Within each date folder --> 1 subfolder for each version on that date ("version folder")
    • Format: v=n/
    • Example: v=1/
    • Note: The version folder is required even if you only upload the file one time per day.
  • Within each version folder --> 1 subfolder to indicate the location of the data ("data folder")
    • Format: data/
    • The data folder is the location to which the source file is uploaded.

In most cases, you use API calls or other available programmatic means to create the date/version/data folders automatically each time the data source file is uploaded. For additional information, see the API reference for your cloud service: AWS, GCS.

_SUCCESS files

Once the upload of a source file to the data folder is complete, an empty file named _SUCCESS should be uploaded to the version folder. This alerts AppsFlyer that a new file is available to be processed. In most cases, you use an API script to automatically generate and upload this file.

Important! The _SUCCESS file must be uploaded to the version folder, outside the data folder.

The filename for the _SUCCESS file:

  • Must be in ALL CAPS
  • Must be preceded by an underscore (_)
  • Should not have a file extension

For multi-part GZIP files:

  • Only one _SUCCESS file should be uploaded for all file parts.
  • The _SUCCESS file should be uploaded only after all file part uploads are complete.

Example (after uploading files)

After uploading source files on 2 days (and programmatically creating date/version/data folders and _SUCCESS files), your bucket/folder structure might look something like this:

dcr_file_structure_after_uploads.png

Working with existing sources

There are several ways in which you might want to work with existing sources. You initiate these processes from the Sources tab of the Data Clean Room:

Editing the source name

To edit the source name:

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the row of the source you want to edit.
  3. Click the edit button edit_button.png that displays on the right side of the row.
  4. On the Edit source page, edit the name of the source.
  5. Click the Save button to save the source with the new name or Cancel to undo your changes.

Editing the source location

To edit the source location:

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the row of the source you want to edit.
  3. Click the edit button edit_button.png that displays on the right side of the row.
  4. On the Edit source page, scroll down to the Source location section.
  5. Click the edit button edit_button.png next to the current source location.
  6. Make the necessary changes in the Source location dialog.
  7. Click Apply to implement your changes.
  8. Click the Save button to save the source with the new location/file format or Cancel to undo your changes.

When a source location is edited, AppsFlyer creates a folder with the new name on the connector.

  • AppsFlyer looks for subsequent versions of the source file and accompanying _SUCCESS files in the new folder.
  • All versions of the source file that had been uploaded before the name change will remain in the previous source folder.

Editing the source structure

To edit the source structure:

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the row of the source you want to edit.
  3. Click the edit button edit_button.png that displays on the right side of the row.
  4. On the Edit source page, the fields that were previously categorized as identifiers, dimensions, or metrics will display in the relevant category lists on the right side of the screen.
  5. You can move a previously categorized field to a different category without reloading fields from the source file. To do this:
    1. First, select it in the relevant category list and use the Remove button to return it to the Available fields list.
    2. Next, select it in the Available fields list and use the buttons in the middle of the screen to categorize it as an identifier, dimension, or metric.
  6. To work with fields in the source file that have not yet been categorized, they must be reloaded from the source location or from a local file. Make this selection by clicking the Reload fields link at the bottom of the Available fields list.
  7. AppsFlyer analyzes the file, and a list of all previously uncategorized data fields (columns) is displayed in the Available fields list.
    • Fields that were previously categorized as identifiers, dimensions, or metrics will still display in the relevant category lists on the right side of the screen.
    • If a previously categorized field is not found in the reloaded source file, it will still display in the relevant category list, but it will be marked with an error icon.
  8. Select one or more fields in the Available fields list on the left and use the buttons in the middle of the screen to categorize them as identifiers, dimensions, or metrics.
  9. Once you have made all necessary changes, click the Save button to save the source with the updated structure or Cancel to undo your changes.

 Important!

Don't forget to make corresponding changes reflecting the new source structure in any reports for which this source is used:

  • Fields that were removed, uncategorized, or changed from their previous categories are automatically removed from any reports in which they are used.
  • Newly added or categorized fields are not automatically included in existing reports until you edit report definitions to include them.

Deleting a source

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the row of the source you want to delete.
  3. Click the delete button delete_button.png that displays on the right side of the row.
  4. In the dialog, confirm that you want to delete the source.
    • You cannot delete a source that is being used by a report. If this is the case, a message will list the reports in which the source is being used. In order to delete the source, you can either:
      • Delete the reports in which it is being used; or
      • Remove the source fields from the definitions of the reports in which they are used.