Data Collaboration Platform (DCP) - Create and manage sources

At a glance: Create and manage sources to securely share your first-party data with other collaborators.

About DCP sources 

In the Data Collaboration Platform (DCP), a source is a dataset, typically uploaded to AppsFlyer from your cloud storage. These sources form the foundation of any collaboration, supplying the data that your collaborators can use for audience creation and activation. This article contains everything you need to know about creating and managing your sources, including:

 Before you begin

Before you create your sources, you should first:

  • Set up the cloud services from which the AppsFlyer DCR will retrieve the source data. If these connections aren't set up, you will be prompted to set them up during source creation. 

Source data requirements

It's recommended to prepare your source data to maximize match rates between collaborator datasets and optimize collaboration outcomes. In addition, all sources must meet the following requirements.

Data format (relevant to all sources)

Data within sources must meet these requirements:

  • Date (only): yyyy-mm-dd (for example, 2023-04-18)
  • Date and time:
    • Format: yyyy-MMM-dd hh:mm:ss (for example, 2023-APR-18 15:30:35)
    • Time zone: UTC
  • Numbers: maximum 2 digits following the decimal point
  • String length: maximum of 256 characters
  • Character limitations:
    • For field names (column headers): no spaces or special characters
    • All other data: no limitations (all characters are valid)

Table columns (relevant only to sources in data warehouses)

In addition to data shared for processing, source tables in BigQuery or Snowflake must include 2 additional columns – one for date and one for version:

  • Date:
    • Column header: dt
    • Column type: date
    • Data format: yyyy-mm-dd (for example, 2023-04-18)
    • Additional: BigQuery tables must be partitioned by this column
  • Version:
    • Column header: v
    • Column type: string
    • Data format: number (for example, 1, 2, 3, 10)
    • Important! A new version of a report is triggered each time the DCR detects a new value in this column. To ensure the completeness of your report, be sure to populate the source table with a complete set of data whenever the column value is changed.

File name and format (relevant only to sources in cloud storage buckets)

Source files stored in Amazon S3 or GCS must meet these file name and format requirements:

  • File name must comply with DCR naming requirements
  • CSV or GZIP format
    • The file underlying GZIP compression must be a CSV file.
  • Number of data source files per data folder:
    • CSV: Maximum of 1
    • GZIP: Maximum of 1 single-part file. Multi-part GZIP files are supported when named as follows: filename_part01.csv.gz, filename_part02.csv.gz, etc.

Create a source

To create a source in DCP, follow the steps below:

Step 1: Access DCP Sources

  1. In AppsFlyer, from the side menu, select Collaborate > Data Clean Room.
     

    Click Data Clean Room.png

  2. Click + New source (on the main page or in the Sources tab).
     

    sources empty state cropped.png

  3. Proceed with the New source walkthrough steps:

Step 2: Set source name

Enter the source name. This can be any unique name that will help you identify the source. You can also add an optional description about the source to help easily identify what it contains (e.g., "All purchases from 2025").

Requirements and guidelines

  • Make sure the source name is unique among all other sources in your account. Otherwise, you won't be able to save the source.
  • For cloud integrations, the name doesn't need to match the file name.
  • Source name requirements:
    • Length: 2-80 characters
    • Valid characters:
      • letters (A-Z, a-z)
      • numbers (0-9), cannot be the first character of a name;
      • Underscore "_"
    • Invalid characters:
      • spaces
      • all other symbols or special characters

Step 3: Set source location

To specify the source location:

  1. Select the connection in which the source will be (or has been) created.
    • If there are no connections defined in your account, the New connection dialog will open, prompting you to create one. Follow these instructions to create it.
    • If you have existing connections but want to use a new one, click + New connection and follow these instructions to create it.
  2. Continue with the relevant instructions below, based on where the data for your source is located.

Note: For best performance when there’s a large amount of data, a subset of your data may be stored separately. This data will be updated whenever the source is updated and deleted when the source is deleted.

Source locations in BigQuery

To complete specifying the source location for BigQuery sources:

  1. Select the dataset in which the source table is located.
  2. Select the table in which the source data is located.

The lists from which you make these selections contain the available datasets and tables, respectively, in the BigQuery project you specified when creating the connection.

Source locations in Snowflake

To complete specifying the source location for Snowflake sources:

  1. Select the share containing the source data.
  2. Select the schema in which the source table is located.
  3. Select the table in which the source data is located.

The lists from which you make these selections contain the shares, schemas, and tables, respectively, in the Snowflake account you specified when creating the connection.

Source locations in cloud storage buckets

Source locations in Amazon S3 or GCS consist of the cloud storage bucket specified by the connection and the underlying folder path from which the DCR reads the source file each time it is updated. 

Once you have specified the connection, AppsFlyer can automatically generate the required underlying folder path as part of the source creation process.

  • Allowing AppsFlyer to generate the folders makes the process easy. However, you can choose to manually create them instead, according to the instructions detailed here.

If AppsFlyer generates the folders, the only additional information required is the name you want to give the source folder. (This is the top-level folder in which you update the source each time your want to use it for running a new report version.) You can also indicate whether you want the source folder to be created underneath a parent folder often named input.

To complete specifying a source location in a cloud storage bucket, enter the source folder name.

  • By default, the displayed source folder name:
    • Is based on the name you gave the source. You can change the folder name to meet your needs, so long as it complies with the DCR naming requirements.
    • Indicates that it will be generated within a parent folder named input. This folder serves as the parent folder for all sources you upload to the DCR.
      • The input folder is not required, and you can remove it or name it something different, so long as it complies with the DCR naming requirements.
      • Although this folder is not required, having an input folder (or an equivalent folder of a different name) is considered best practice. It is even more highly recommended when you are using the same cloud storage bucket both for uploading data files (input) and receiving reports (output).

 Important!

If you manually created the folder path, make sure the connection and path you enter in the Source location section match the path you manually created.

 

Source location as local files 

You can also upload source data using a local file. However, It’s strongly recommended to set a cloud service as your source location (crucial for larger data sets and automatic updates). A local file is primarily used for testing purposes and getting familiar with the platform functionalities. 

To upload source data using a local file: 

When selecting your source location, click on Local file system and Next, which adds the option to upload local files from your device. 

Select Local file system.png 

 Note

  • Supported file types: .CSV and .GZ files (max size 5 GB).
  • Data uploaded from a local file will be saved for 180 days.
  • Updates must be performed manually. 

Step 4. Set source update method

When setting up a source in the Data Collaboration Platform (DCP), you must choose how you'd like updates to that source to be handled. DCP supports two sync methods: Snapshot and Append. Each method suits different use cases and data flows:

  • Snapshot - This method uploads the full dataset each time the source is updated, completely replacing the old dataset each time with the new one. Choose this method only if your file uploads always contain the complete and up-to-date dataset.
  • Append - This method uploads only new or changed data, not the full dataset. Old datasets are not removed or updated. It is ideal for recurring uploads like daily or weekly reports. Benefits include smaller uploads (may drive lower cloud storage costs), faster data upload performance, easier record management, and more.
Delta Generate folders.png

Important!

Please determine which method is best before selecting. Once an option is selected and Generate folders is clicked, a file path along with instructions for handling data uploads is created within your selected cloud service. To prevent folder duplications and errors, you can change the sync method only once after the initial selection. 
 

 

Snapshot upload requirements

Use this method to upload a complete, self-contained file that fully replaces the prior dataset each time the source is updated.

Supported cloud services

  • Amazon S3
  • Google Cloud Storage (GCS)
  • BigQuery
  • Snowflake

 

Follow these guidelines for the Snapshot update method:

  1. Source folder path

    When you register a cloud‑storage source in DCP, a root path (e.g. input/my_source_folder/) is defined for that source. Each time you refresh the dataset, you upload a full snapshot—a file that contains all current rows—to this folder. This guarantees that DCP always processes the most up‑to‑date and complete view of the data.

  2. Uploading full snapshot files

Guideline Why it matters How to comply
Complete dataset DCP must replace the prior state in full Generate an export that includes every record, not just changes
Date-partitioned folder (dt=) Enables date‑range filtering & retention management Place each snapshot under dt=YYYY‑MM‑DD/
Version sub‑folder (v=) Allows multiple snapshots on the same day First upload under v=1/; retries use v=2/, v=3/, etc.
data/ sub-folder Organizes your files Store Parquet / CSV / Avro files in this folder
_SUCCESS marker Signals DCP ingestion to begin Upload an empty file named _SUCCESS inside the v= folder after all data files land
Supported file type and size  
  • Supported file types: .csv or .gz
  • Maximum file size: 5 GB

Important!

The full snapshot file (BI-data.csv in the screenshot below) must be placed under:

/v=1/data/

Then, once the data file is uploaded, place an empty _SUCCESS file in:

/v=1/

 

Folder anatomy:

s3://af-dcr-xyz/abcd9876/input/source_name1/dt=2025-mm-dd/v=1/data/
└ bucket ┘ └ tenant ┘ └ingestion┘ └source┘└snapshot date┘└ver┘└files┘
Segment Description Purpose
af-dcr-xyz S3 / GCS bucket Top‑level container for all DCP data
abcd9876/ Tenant / workspace ID Keeps each customer’s data isolated
input/ Ingestion area Where raw uploads land before processing
source_name1/ Source name Logical dataset registered in DCP
dt=2025-mm-dd/ Snapshot date Represents the dataset’s full state on that day
v=1/ Version (optional) Supports retries or additional snapshots
data/ Files Parquet / CSV / Avro snapshot files

 

Example of a two-day snapshot structure:

source_name1/
  dt=2025-08-10/
    v=1/
      data/
        snapshot.parquet
      _SUCCESS
  dt=2025-08-11/
    v=1/
      data/
        snapshot.parquet
      _SUCCESS

 

Example of how folder structure should look after uploading a complete snapshot of the source files for 2 days (and programmatically creating date/version/data folders and _SUCCESS files), your bucket/folder structure might look something like this:
 

Snapshot example bucket.png

 

Append upload requirements

Use this method to upload only the data that changed or was added since the previous upload.

Why use Append?

  • Efficiency – smaller uploads
  • Data integrity – preserves full change history for time‑series analysis
  • Privacy compliance – simplifies record deletion & consent management

Supported cloud services: AWS S3 and Google Cloud Storage (GCS)
Note: BigQuery and Snowflake are not supported or the Append method and must use the Snapshot method.

 

Follow these guidelines for the Append update method:

1. Source folder setup

Step Description
Create / assign bucket Use an existing S3 / GCS bucket or let AppsFlyer provision one for you
Define root path A tenant‑scoped path (e.g. s3://af‑dcr‑xyz/abcd1234/) is configured as the input directory
Security Grant the service user write permission only on this path

2. Uploading append (delta) files 

Guideline Why it matters How to comply
Upload cadence Keeps transfer cost & ops overhead low Upload daily, or bundle 7-day folders and upload weekly
One folder per day Enables date filtering & avoids scanning large batches Place each day's delta in dt=YYYY‑MM‑DD/
Unique file names Prevents accidental overwrites Give files unique names modeled like this:events_2025‑08‑04.parquet events_2025‑08‑05.parquet
Required columns Enables tracking of changes

The file must include:

  •  
    • A date column to associate records with a reporting timeframe.
    • A version column to track data revisions. 

      Note: Files missing these columns will result in a loading error in DCP.

 

Folder anatomy:

s3://af-dcr-xyz/abcd1234/input/source_name/dt=2025-08-10/
└ bucket ┘ └ tenant ┘ └ ingestion ┘ └ source ┘ └ day partition ┘
Segment Purpose
af-dcr-xyz Customer’s S3/GCS bucket
abcd1234/ Tenant or workspace ID
input/ Raw uploads awaiting ingestion
source_name/ Logical DCP source (e.g., CRM_Events)
dt=2025-08-10/ Daily partition folder

 

Example weekly delta folder structure:

source_name/
  dt=2025-08-04/
  dt=2025-08-05/
  dt=2025-08-06/
  dt=2025-08-07/
  dt=2025-08-08/
  dt=2025-08-09/
  dt=2025-08-10/

 

Rule‑of‑thumb cheat‑sheet:

Requirement Why Example
Weekly batch (optional) Fewer pushes, same granularity Upload one batch containing seven sibling dt= folders
Day partitions Speeds up queries & UI filters s3://your‑bucket/source/ dt=2025‑08‑04/ ... dt=2025‑08‑10/

Key takeaway: Even when uploading weekly, always organize data into day-specific dt=YYYY‑MM‑DD folders.

Step 5: Map source fields

Map source fields to DCP fields, test it, and save the configured data:

  1. Load your source fields
  2. Configure the loaded fields
  3. Verify EU user inclusion
  4. Test the source data
  5. Save the source

1. Load your  source fields

Source fields load automatically. In case manual loading is needed, follow the instructions below according to the source location:

Data warehouse sources

To load fields from a source located in a data warehouse (BigQuery or Snowflake), click Load fields from source.

 Important!

If the selected source table does not include the required date and version columns, you will receive an error.

Cloud storage bucket sources

To load fields from a source located in a cloud storage bucket (Amazon S3 or GCS), you must upload a prototype source file.

For purposes of mapping the source fields to DCP fields: 

  • You can upload a prototype version of the source from a local file.
    • If you select this option, AppsFlyer always creates the source folder path automatically.

                                                                - or -

  • You can upload a prototype version of the source file directly from the connected cloud bucket.

To upload your prototype source file, follow the instructions in the relevant tab below:

Local file Connection (automatic creation) Connection (manual creation)
  1. In the Map source fields section, click Load fields from source.
  2. In the window that opens, select Upload a local file.
  3. Specify the CSV or GZIP file you want to upload, then click OK.

2. Configure the source fields

After loading the source fields, each source field (column) is presented with a DCP field. Review each source field and map it to the appropriate DCP field from the drop-down list beside it. Consider the following:

 Considerations

  • When both parties share their source data, at least one field must be set as an identifier to enable user-level data to match across the corresponding sources. An identifier is a field that uniquely identifies an app user (for example, CUID, AppsFlyer ID, or hashed email).
  • Although configuring each of the uploaded source fields (columns) isn't mandatory, it is important for categorization, effective data interpretation, aiding audience creation, suggesting insights, and effectively facilitating validations.

To remove a field:

  • Hover over the right side of the field you want to remove and click the dustbin icon that appears when hovering.

To add fields manually:

This option allows you to include a field in the audience that isn't currently present in the data source.

  1. Click + New field. An empty field is added.
  2. Enter the name of the field and select its type.
     
NewField.png


Field types include: 

  • Text
  • Identifier  
    • Android ID
    • AppsFlyer ID
    • CUID
    • Email
    • Hashed email
    • ID5 Id
    • IDFA
    • IP address
    • MediaMath Id
    • Mobile Ad Id
    • Phone number
    • Platform ID
    • SHA256 hashed ID
    • SHA256 phone number
  • Booolean
  • Date
  • Time
  • DMA
    • Ad personalization enabled
    • Ad user data enabled
    • EU DMA Applies
  • Number
    • Double
    • List of Numbers
    • Long
    • Number

 

Reload source fields

If the configuration of one of your data files has changed, you can update the source file to reflect the changes.

 Note

Reloading the source resets the column names in the source to match the updated file names. This will overwrite any of the field names in the list and their type. 

To reload updated fields from a file:

  1. Click Reload fields.
  2. Select the file location.
    • For a local file: Upload the file.
    • For a file from your cloud service: Click Load from cloud bucket and follow the instructions.
  3. Click OK. The updated files are now displayed. 

3. Verify EU user inclusion

  • Select Yes or No to the question: Does your source include European users, to which EU DMA regulations apply?

Learn more about privacy regulations for the EU Digital Markets Act.

4. Test the source data

  • [Optional] Click Test to check for errors in the format or validity of the source fields.

5. Save the source

  • Click Save to save the source.

After confirming, the new source is added under the Sources tab.

 Note

If you uploaded the source from a local file, saving the source triggers the automatic creation of the folder structure, and the displayed confirmation message includes a link to the source folder.

Manage your sources

The sources you've created are displayed in the Sources tab. From here, you can edit the source name and structure, share it with a collaborator, and delete it—provided it's not already being used in an audience.

Edit a source

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the source you want to edit, and click the edit icon edit_button.png at the end of the row.
  3. On the Edit source page, edit the relevant fields, as detailed below.
  4. Click Save.

Edit the source name

When editing the source name, make sure to follow these naming requirements.

 Edit the source location

  1. From the Edit source page > Source location, select a different data connection.
  2. Select the relevant location details.
  3. [Optional] Test the source.
  4. Click Save.

Edit the field mapping

  1. Go to the field mapping, and make the necessary changes: Change the field or update its mapped DCP field.
  2. Click Save.

 Important!

Don't forget to make corresponding changes reflecting the new source structure in any reports for which this source is used:

  • Fields that were removed, unmapped, or changed from their previous mapping are automatically removed from any reports in which they are used.
  • Newly added or mapped fields are not automatically included in existing reports until you edit report definitions to include them.

Delete a source

You can delete any source except when it's being used in an audience. If this is the case, a notification will specify the audiences using that source. To enable the deletion of the source, you must first delete the audience linked to it. As for sources shared with you, they can only be deleted by the source owner.

  1. Go to the Sources tab of the Data Clean Room.
  2. In the list of sources, hover over the row of the source you want to delete.
  3. Click the delete icon delete_button.png showing on the right side of the row.
  4. In the dialog, click Delete to confirm.

Sharing permissions

Granting permissions to your collaborators to view and use your source data is done when creating collaborations. Learn more about sharing permissions.

Additional information

This section provides some additional references and useful information.

Manually create a storage bucket folder structure (Optional)

In general, it's easiest to allow AppsFlyer to automatically generate the required folder structure as part of the source creation process. However, if you wish to create these folders manually, you can do so as follows.

Create a DCR key folder

To ensure maximum security, the folder directly beneath the bucket (the "DCR key folder") must be named with the 8-character, alphanumeric DCR key assigned to your account (for example, 01bcc5fb). Note that this is different from any other password or key associated with your AppsFlyer account.

The DCR key folder is generally created manually using the interface of your selected cloud service.

To get your account's DCR key:

  • Click the 3-dot (actions) menu on the upper right and select DCR key.

DCR key.png

After creating the DCR key folder, your bucket/folder structure would look something like this:

dcr_file_structure_dcr_key_folder.png

Top-level input folder

Though it is not required, best practice is to create a top-level input folder directly beneath the DCR key folder. This folder will be dedicated to files you upload to the DCR.

The top-level input folder is generally created manually using the interface of your selected cloud service.

  • This practice is even more highly recommended when you are using the same bucket both for uploading data files (input) and receiving reports (output).
  • You can name this folder anything you want, so long as it complies with the DCR naming requirements. For ease of identification, it is usually named input/.

After creating the top-level input folder, your bucket/folder structure might look something like this:

dcr_file_structure_input_folder.png

Second-level folder for each data source

You can regularly upload different data source files to the DCR for processing. Each of these data sources must be assigned a separate folder ("data source folders").

So, for example, if you plan to upload 2 files to the DCR for processing every day: BI-data.csv and CRM-data.gzip, you would assign each of these data sources a folder. You could choose to call these folders BI-data/ and CRM-data/.

The data source folders are generally created manually using the interface of your selected cloud service.

After creating 2 data source folders, your bucket/folder structure might look something like this:

dcr_file_structure_source_folders.png

Under each data source folder, nested subfolders by date and version must be created each time the source is updated.

Privacy regulations

This section outlines important information about current privacy regulations.

Understanding Google's EU user consent policy and its implications

As part of Google’s enforcement of the Digital Markets Act (DMA) Google updated its EU user consent policy as of March 6, 2024. As a Google App Attribution Partner, AppsFlyer made the necessary changes to support their policy requirements, while ensuring that advertisers maximize the value from their Google Ads marketing channels. 

 Note

Adding consent fields

When setting up a source intended for audience activation on Google and confirming Yes to the question Does your source involve European users subject to EU DMA regulations? ensure that the additional consent fields in the table below are included in the source file. This enables AppsFlyer to transfer the necessary information to Google during the activation process.

Additional consent fields and their response values:

Field name Response value Field explained
eea true/false Is the user located in the EEA (European Economic Area), to which the DMA applies?
ad_personalization *true/false Did the user give Google consent to use their data for personalized advertising?
ad_user_data *true/false Did the user give consent to send their user data to Google? 
* When “true”, AppsFlyer includes the user identifiers you've sent for users who gave consent.
   When “false”: AppsFlyer doesn't include them, as they weren't sent to AppsFlyer.

Potential impact on the audience size

The actual and estimated audience size sent to Google may vary based on the number of users granting or denying consent.