At a glance: Create and manage sources to securely share your first-party data with other collaborators.
About DCP sources
In the Data Collaboration Platform (DCP), a source is a dataset, typically uploaded to AppsFlyer from your cloud storage. These sources form the foundation of any collaboration, supplying the data that your collaborators can use for audience creation and activation. This article contains everything you need to know about creating and managing your sources, including:
Before you begin
Before you create your sources, you should first:
- Set up the cloud services from which the AppsFlyer DCR will retrieve the source data. If these connections aren't set up, you will be prompted to set them up during source creation.
Source data requirements
It's recommended to prepare your source data to maximize match rates between collaborator datasets and optimize collaboration outcomes. In addition, all sources must meet the following requirements.
Data format (relevant to all sources)
Data within sources must meet these requirements:
-
Date (only): yyyy-mm-dd (for example,
2023-04-18) -
Date and time:
-
Format: yyyy-MMM-dd hh:mm:ss (for example,
2023-APR-18 15:30:35) - Time zone: UTC
-
Format: yyyy-MMM-dd hh:mm:ss (for example,
- Numbers: maximum 2 digits following the decimal point
- String length: maximum of 256 characters
-
Character limitations:
- For field names (column headers): no spaces or special characters
- All other data: no limitations (all characters are valid)
Table columns (relevant only to sources in data warehouses)
In addition to data shared for processing, source tables in BigQuery or Snowflake must include 2 additional columns – one for date and one for version:
-
Date:
-
Column header:
dt - Column type: date
-
Data format: yyyy-mm-dd (for example,
2023-04-18) - Additional: BigQuery tables must be partitioned by this column
-
Column header:
-
Version:
-
Column header:
v - Column type: string
-
Data format: number (for example,
1,2,3,10) - Important! A new version of a report is triggered each time the DCR detects a new value in this column. To ensure the completeness of your report, be sure to populate the source table with a complete set of data whenever the column value is changed.
-
Column header:
File name and format (relevant only to sources in cloud storage buckets)
Source files stored in Amazon S3 or GCS must meet these file name and format requirements:
- File name must comply with DCR naming requirements
- CSV or GZIP format
- The file underlying GZIP compression must be a CSV file.
- Number of data source files per data folder:
- CSV: Maximum of 1
-
GZIP: Maximum of 1 single-part file. Multi-part GZIP files are supported when named as follows:
filename_part01.csv.gz,filename_part02.csv.gz, etc.
Create a source
To create a source in DCP, follow the steps below:
Step 1: Access DCP Sources
-
In AppsFlyer, from the side menu, select Collaborate > Data Clean Room.
-
Click + New source (on the main page or in the Sources tab).
- Proceed with the New source walkthrough steps:
Step 2: Set source name
Enter the source name. This can be any unique name that will help you identify the source. You can also add an optional description about the source to help easily identify what it contains (e.g., "All purchases from 2025").
Requirements and guidelines
- Make sure the source name is unique among all other sources in your account. Otherwise, you won't be able to save the source.
- For cloud integrations, the name doesn't need to match the file name.
- Source name requirements:
- Length: 2-80 characters
- Valid characters:
- letters
(A-Z, a-z) - numbers
(0-9), cannot be the first character of a name; -
Underscore
"_"
- letters
- Invalid characters:
- spaces
- all other symbols or special characters
Step 3: Set source location
To specify the source location:
- Select the connection in which the source will be (or has been) created.
- If there are no connections defined in your account, the New connection dialog will open, prompting you to create one. Follow these instructions to create it.
- If you have existing connections but want to use a new one, click + New connection and follow these instructions to create it.
- Continue with the relevant instructions below, based on where the data for your source is located.
Note: For best performance when there’s a large amount of data, a subset of your data may be stored separately. This data will be updated whenever the source is updated and deleted when the source is deleted.
Source locations in BigQuery
To complete specifying the source location for BigQuery sources:
- Select the dataset in which the source table is located.
- Select the table in which the source data is located.
The lists from which you make these selections contain the available datasets and tables, respectively, in the BigQuery project you specified when creating the connection.
Source locations in Snowflake
To complete specifying the source location for Snowflake sources:
- Select the share containing the source data.
- Select the schema in which the source table is located.
- Select the table in which the source data is located.
The lists from which you make these selections contain the shares, schemas, and tables, respectively, in the Snowflake account you specified when creating the connection.
Source locations in cloud storage buckets
Source locations in Amazon S3 or GCS consist of the cloud storage bucket specified by the connection and the underlying folder path from which the DCR reads the source file each time it is updated.
Once you have specified the connection, AppsFlyer can automatically generate the required underlying folder path as part of the source creation process.
- Allowing AppsFlyer to generate the folders makes the process easy. However, you can choose to manually create them instead, according to the instructions detailed here.
If AppsFlyer generates the folders, the only additional information required is the name you want to give the source folder. (This is the top-level folder in which you update the source each time your want to use it for running a new report version.) You can also indicate whether you want the source folder to be created underneath a parent folder – often named input.
To complete specifying a source location in a cloud storage bucket, enter the source folder name.
- By default, the displayed source folder name:
- Is based on the name you gave the source. You can change the folder name to meet your needs, so long as it complies with the DCR naming requirements.
- Indicates that it will be generated within a parent folder named input. This folder serves as the parent folder for all sources you upload to the DCR.
- The input folder is not required, and you can remove it or name it something different, so long as it complies with the DCR naming requirements.
- Although this folder is not required, having an input folder (or an equivalent folder of a different name) is considered best practice. It is even more highly recommended when you are using the same cloud storage bucket both for uploading data files (input) and receiving reports (output).
Important!
If you manually created the folder path, make sure the connection and path you enter in the Source location section match the path you manually created.
Source location as local files
You can also upload source data using a local file. However, It’s strongly recommended to set a cloud service as your source location (crucial for larger data sets and automatic updates). A local file is primarily used for testing purposes and getting familiar with the platform functionalities.
To upload source data using a local file:
When selecting your source location, click on Local file system and Next, which adds the option to upload local files from your device.
Note
- Supported file types: .CSV and .GZ files (max size 5 GB).
- Data uploaded from a local file will be saved for 180 days.
- Updates must be performed manually.
Step 4. Set source update method
When setting up a source in the Data Collaboration Platform (DCP), you must choose how you'd like updates to that source to be handled. DCP supports two sync methods: Snapshot and Append. Each method suits different use cases and data flows:
- Snapshot - This method uploads the full dataset each time the source is updated, completely replacing the old dataset each time with the new one. Choose this method only if your file uploads always contain the complete and up-to-date dataset.
- Append - This method uploads only new or changed data, not the full dataset. Old datasets are not removed or updated. It is ideal for recurring uploads like daily or weekly reports. Benefits include smaller uploads (may drive lower cloud storage costs), faster data upload performance, easier record management, and more.
Important!
Please determine which method is best before selecting. Once an option is selected and Generate folders is clicked, a file path along with instructions for handling data uploads is created within your selected cloud service. To prevent folder duplications and errors, you can change the sync method only once after the initial selection.
Snapshot upload requirements
Use this method to upload a complete, self-contained file that fully replaces the prior dataset each time the source is updated.
Supported cloud services
- Amazon S3
- Google Cloud Storage (GCS)
- BigQuery
- Snowflake
Follow these guidelines for the Snapshot update method:
-
Source folder path
When you register a cloud‑storage source in DCP, a root path (e.g.
input/my_source_folder/) is defined for that source. Each time you refresh the dataset, you upload a full snapshot—a file that contains all current rows—to this folder. This guarantees that DCP always processes the most up‑to‑date and complete view of the data. Uploading full snapshot files
| Guideline | Why it matters | How to comply |
|---|---|---|
| Complete dataset | DCP must replace the prior state in full | Generate an export that includes every record, not just changes |
Date-partitioned folder (dt=)
|
Enables date‑range filtering & retention management | Place each snapshot under dt=YYYY‑MM‑DD/
|
Version sub‑folder (v=)
|
Allows multiple snapshots on the same day | First upload under v=1/; retries use v=2/, v=3/, etc. |
data/ sub-folder
|
Organizes your files | Store Parquet / CSV / Avro files in this folder |
_SUCCESS marker
|
Signals DCP ingestion to begin | Upload an empty file named _SUCCESS inside the v= folder after all data files land |
| Supported file type and size |
|
Important!
The full snapshot file (BI-data.csv in the screenshot below) must be placed under:
/v=1/data/
Then, once the data file is uploaded, place an empty _SUCCESS file in:
/v=1/
Folder anatomy:
s3://af-dcr-xyz/abcd9876/input/source_name1/dt=2025-mm-dd/v=1/data/
└ bucket ┘ └ tenant ┘ └ingestion┘ └source┘└snapshot date┘└ver┘└files┘
| Segment | Description | Purpose |
|---|---|---|
af-dcr-xyz |
S3 / GCS bucket | Top‑level container for all DCP data |
abcd9876/ |
Tenant / workspace ID | Keeps each customer’s data isolated |
input/ |
Ingestion area | Where raw uploads land before processing |
source_name1/ |
Source name | Logical dataset registered in DCP |
dt=2025-mm-dd/ |
Snapshot date | Represents the dataset’s full state on that day |
v=1/ |
Version (optional) | Supports retries or additional snapshots |
data/ |
Files | Parquet / CSV / Avro snapshot files |
Example of a two-day snapshot structure:
source_name1/
dt=2025-08-10/
v=1/
data/
snapshot.parquet
_SUCCESS
dt=2025-08-11/
v=1/
data/
snapshot.parquet
_SUCCESS
Example of how folder structure should look after uploading a complete snapshot of the source files for 2 days (and programmatically creating date/version/data folders and _SUCCESS files), your bucket/folder structure might look something like this:
Append upload requirements
Use this method to upload only the data that changed or was added since the previous upload.
Why use Append?
- Efficiency – smaller uploads
- Data integrity – preserves full change history for time‑series analysis
- Privacy compliance – simplifies record deletion & consent management
Supported cloud services: AWS S3 and Google Cloud Storage (GCS)
Note: BigQuery and Snowflake are not supported or the Append method and must use the Snapshot method.
Follow these guidelines for the Append update method:
1. Source folder setup
| Step | Description |
|---|---|
| Create / assign bucket | Use an existing S3 / GCS bucket or let AppsFlyer provision one for you |
| Define root path | A tenant‑scoped path (e.g. s3://af‑dcr‑xyz/abcd1234/) is configured as the input directory |
| Security | Grant the service user write permission only on this path |
2. Uploading append (delta) files
| Guideline | Why it matters | How to comply |
|---|---|---|
| Upload cadence | Keeps transfer cost & ops overhead low | Upload daily, or bundle 7-day folders and upload weekly |
| One folder per day | Enables date filtering & avoids scanning large batches | Place each day's delta in dt=YYYY‑MM‑DD/
|
| Unique file names | Prevents accidental overwrites | Give files unique names modeled like this:events_2025‑08‑04.parquet events_2025‑08‑05.parquet
|
| Required columns | Enables tracking of changes |
The file must include:
|
Folder anatomy:
s3://af-dcr-xyz/abcd1234/input/source_name/dt=2025-08-10/
└ bucket ┘ └ tenant ┘ └ ingestion ┘ └ source ┘ └ day partition ┘
| Segment | Purpose |
|---|---|
af-dcr-xyz |
Customer’s S3/GCS bucket |
abcd1234/ |
Tenant or workspace ID |
input/ |
Raw uploads awaiting ingestion |
source_name/ |
Logical DCP source (e.g., CRM_Events) |
dt=2025-08-10/ |
Daily partition folder |
Example weekly delta folder structure:
source_name/
dt=2025-08-04/
dt=2025-08-05/
dt=2025-08-06/
dt=2025-08-07/
dt=2025-08-08/
dt=2025-08-09/
dt=2025-08-10/
Rule‑of‑thumb cheat‑sheet:
| Requirement | Why | Example |
|---|---|---|
| Weekly batch (optional) | Fewer pushes, same granularity | Upload one batch containing seven sibling dt= folders |
| Day partitions | Speeds up queries & UI filters | s3://your‑bucket/source/ dt=2025‑08‑04/ ... dt=2025‑08‑10/ |
Key takeaway: Even when uploading weekly, always organize data into day-specific dt=YYYY‑MM‑DD folders.
Step 5: Map source fields
Map source fields to DCP fields, test it, and save the configured data:
- Load your source fields
- Configure the loaded fields
- Verify EU user inclusion
- Test the source data
- Save the source
1. Load your source fields
Source fields load automatically. In case manual loading is needed, follow the instructions below according to the source location:
Data warehouse sources
To load fields from a source located in a data warehouse (BigQuery or Snowflake), click Load fields from source.
Important!
If the selected source table does not include the required date and version columns, you will receive an error.
Cloud storage bucket sources
To load fields from a source located in a cloud storage bucket (Amazon S3 or GCS), you must upload a prototype source file.
For purposes of mapping the source fields to DCP fields:
- You can upload a prototype version of the source from a local file.
- If you select this option, AppsFlyer always creates the source folder path automatically.
- or -
- You can upload a prototype version of the source file directly from the connected cloud bucket.
- If you select this option, there's one additional choice to make:
- Allow AppsFlyer to automatically generate the cloud folder for the source file, or
- Create the cloud folder for the source file manually
- If you select this option, there's one additional choice to make:
To upload your prototype source file, follow the instructions in the relevant tab below:
- In the Map source fields section, click Load fields from source.
- In the window that opens, select Upload a local file.
- Specify the CSV or GZIP file you want to upload, then click OK.
To load the file from the connected cloud bucket and allow AppsFlyer to generate the cloud folder for the source file:
- In the Map source fields section, click Load fields from source.
- In the window that opens, select Load from cloud bucket.
- Click the Generate folders link.
- AppsFlyer automatically generates the required folder structure and cloud source folder (on the connection you specified, with the source folder name you specified).
- After the source folder structure has been created, a confirmation message is displayed, including a link to the source folder. Click the provided link to upload your prototype file to the source folder.
- Once the file has finished uploading, click OK.
To upload the source file from a structure you created manually:
- In the Map source fields section, click Load fields from source.
- In the window that opens, select Load from cloud bucket.
- DO NOT click Generate folders. Instead, upload the file directly to the source folder you created for it.
- Once the file has finished uploading, click OK.
2. Configure the source fields
After loading the source fields, each source field (column) is presented with a DCP field. Review each source field and map it to the appropriate DCP field from the drop-down list beside it. Consider the following:
Considerations
- When both parties share their source data, at least one field must be set as an identifier to enable user-level data to match across the corresponding sources. An identifier is a field that uniquely identifies an app user (for example, CUID, AppsFlyer ID, or hashed email).
- Although configuring each of the uploaded source fields (columns) isn't mandatory, it is important for categorization, effective data interpretation, aiding audience creation, suggesting insights, and effectively facilitating validations.
To remove a field:
- Hover over the right side of the field you want to remove and click the dustbin icon that appears when hovering.
To add fields manually:
This option allows you to include a field in the audience that isn't currently present in the data source.
- Click + New field. An empty field is added.
- Enter the name of the field and select its type.
Field types include:
- Text
- Identifier
- Android ID
- AppsFlyer ID
- CUID
- Hashed email
- ID5 Id
- IDFA
- IP address
- MediaMath Id
- Mobile Ad Id
- Phone number
- Platform ID
- SHA256 hashed ID
- SHA256 phone number
- Booolean
- Date
- Time
- DMA
- Ad personalization enabled
- Ad user data enabled
- EU DMA Applies
- Number
- Double
- List of Numbers
- Long
- Number
Reload source fields
If the configuration of one of your data files has changed, you can update the source file to reflect the changes.
Note
Reloading the source resets the column names in the source to match the updated file names. This will overwrite any of the field names in the list and their type.
To reload updated fields from a file:
- Click Reload fields.
- Select the file location.
- For a local file: Upload the file.
- For a file from your cloud service: Click Load from cloud bucket and follow the instructions.
- Click OK. The updated files are now displayed.
3. Verify EU user inclusion
- Select Yes or No to the question: Does your source include European users, to which EU DMA regulations apply?
Learn more about privacy regulations for the EU Digital Markets Act.
4. Test the source data
- [Optional] Click Test to check for errors in the format or validity of the source fields.
5. Save the source
- Click Save to save the source.
After confirming, the new source is added under the Sources tab.
Note
If you uploaded the source from a local file, saving the source triggers the automatic creation of the folder structure, and the displayed confirmation message includes a link to the source folder.
Manage your sources
The sources you've created are displayed in the Sources tab. From here, you can edit the source name and structure, share it with a collaborator, and delete it—provided it's not already being used in an audience.
Edit a source
- Go to the Sources tab of the Data Clean Room.
- In the list of sources, hover over the source you want to edit, and click the edit icon
at the end of the row.
- On the Edit source page, edit the relevant fields, as detailed below.
- Click Save.
Edit the source name
When editing the source name, make sure to follow these naming requirements.
Edit the source location
- From the Edit source page > Source location, select a different data connection.
- Select the relevant location details.
- [Optional] Test the source.
- Click Save.
Edit the field mapping
- Go to the field mapping, and make the necessary changes: Change the field or update its mapped DCP field.
- Click Save.
Important!
Don't forget to make corresponding changes reflecting the new source structure in any reports for which this source is used:
- Fields that were removed, unmapped, or changed from their previous mapping are automatically removed from any reports in which they are used.
- Newly added or mapped fields are not automatically included in existing reports until you edit report definitions to include them.
Delete a source
You can delete any source except when it's being used in an audience. If this is the case, a notification will specify the audiences using that source. To enable the deletion of the source, you must first delete the audience linked to it. As for sources shared with you, they can only be deleted by the source owner.
- Go to the Sources tab of the Data Clean Room.
- In the list of sources, hover over the row of the source you want to delete.
- Click the delete icon
showing on the right side of the row.
- In the dialog, click Delete to confirm.
Sharing permissions
Granting permissions to your collaborators to view and use your source data is done when creating collaborations. Learn more about sharing permissions.
Additional information
This section provides some additional references and useful information.
Manually create a storage bucket folder structure (Optional)
In general, it's easiest to allow AppsFlyer to automatically generate the required folder structure as part of the source creation process. However, if you wish to create these folders manually, you can do so as follows.
Create a DCR key folder
To ensure maximum security, the folder directly beneath the bucket (the "DCR key folder") must be named with the 8-character, alphanumeric DCR key assigned to your account (for example, 01bcc5fb). Note that this is different from any other password or key associated with your AppsFlyer account.
The DCR key folder is generally created manually using the interface of your selected cloud service.
To get your account's DCR key:
- Click the 3-dot (actions) menu on the upper right and select DCR key.
After creating the DCR key folder, your bucket/folder structure would look something like this:
Top-level input folder
Though it is not required, best practice is to create a top-level input folder directly beneath the DCR key folder. This folder will be dedicated to files you upload to the DCR.
The top-level input folder is generally created manually using the interface of your selected cloud service.
- This practice is even more highly recommended when you are using the same bucket both for uploading data files (input) and receiving reports (output).
- You can name this folder anything you want, so long as it complies with the DCR naming requirements. For ease of identification, it is usually named
input/.
After creating the top-level input folder, your bucket/folder structure might look something like this:
Second-level folder for each data source
You can regularly upload different data source files to the DCR for processing. Each of these data sources must be assigned a separate folder ("data source folders").
So, for example, if you plan to upload 2 files to the DCR for processing every day: BI-data.csv and CRM-data.gzip, you would assign each of these data sources a folder. You could choose to call these folders BI-data/ and CRM-data/.
The data source folders are generally created manually using the interface of your selected cloud service.
After creating 2 data source folders, your bucket/folder structure might look something like this:
Under each data source folder, nested subfolders by date and version must be created each time the source is updated.
Privacy regulations
This section outlines important information about current privacy regulations.
Understanding Google's EU user consent policy and its implications
As part of Google’s enforcement of the Digital Markets Act (DMA) Google updated its EU user consent policy as of March 6, 2024. As a Google App Attribution Partner, AppsFlyer made the necessary changes to support their policy requirements, while ensuring that advertisers maximize the value from their Google Ads marketing channels.
Note
- Google’s enforcement of the Digital Markets Act (DMA) applies to all platforms.
- Read more about this:
Adding consent fields
When setting up a source intended for audience activation on Google and confirming Yes to the question Does your source involve European users subject to EU DMA regulations? ensure that the additional consent fields in the table below are included in the source file. This enables AppsFlyer to transfer the necessary information to Google during the activation process.
Additional consent fields and their response values:
| Field name | Response value | Field explained |
|---|---|---|
eea |
true/false | Is the user located in the EEA (European Economic Area), to which the DMA applies? |
ad_personalization |
*true/false | Did the user give Google consent to use their data for personalized advertising? |
ad_user_data |
*true/false | Did the user give consent to send their user data to Google? |
| * When “true”, AppsFlyer includes the user identifiers you've sent for users who gave consent. When “false”: AppsFlyer doesn't include them, as they weren't sent to AppsFlyer. | ||
Potential impact on the audience size
The actual and estimated audience size sent to Google may vary based on the number of users granting or denying consent.