At a glance: Set up and manage sources to securely share your first-party data with other collaborators.
About DCP
The Data Collaboration Platform (DCP) functions as the central point for data collaboration, including audience creation and activation. DCP relies on the advanced technology of the Data Clean Room (DCR) to ensure data privacy and security for the collaboration and audience management processes.
Overview
This article contains everything you need to know about creating and managing your sources, including how to:
Before you begin
Before you create your sources, you should first:
- [Required] Set up the cloud services from which the DCR will retrieve the data. Two types of cloud services are supported:
- Data warehouses: BigQuery and Snowflake
- Cloud storage buckets: Amazon S3 (AWS) and GCS
- [Optional] Create inbound connections in the AppsFlyer platform to connect these cloud services to the DCR. If these connections were not set up previously, you will be prompted to set them up during source creation.
Source data requirements
Sources must meet these requirements in order to prevent errors in source creation.
Data format (relevant to all sources)
Data within sources must meet these requirements:
-
Date (only): yyyy-mm-dd (for example,
2023-04-18
) -
Date and time:
-
Format: yyyy-MMM-dd hh:mm:ss (for example,
2023-APR-18 15:30:35
) - Time zone: UTC
-
Format: yyyy-MMM-dd hh:mm:ss (for example,
- Numbers: maximum 2 digits following the decimal point
- String length: maximum of 256 characters
-
Character limitations:
- For field names (column headers): no spaces or special characters
- All other data: no limitations (all characters are valid)
Table columns (relevant only to sources in data warehouses)
In addition to data shared for processing, source tables in BigQuery or Snowflake must include 2 additional columns – one for date and one for version:
-
Date:
-
Column header:
dt
- Column type: date
-
Data format: yyyy-mm-dd (for example,
2023-04-18
) - Additional: BigQuery tables must be partitioned by this column
-
Column header:
-
Version:
-
Column header:
v
- Column type: string
-
Data format: number (for example,
1
,2
,3
,10
) - Important! A new version of a report is triggered each time the DCR detects a new value in this column. To ensure the completeness of your report, be sure to populate the source table with a complete set of data whenever the column value is changed.
-
Column header:
File name and format (relevant only to sources in cloud storage buckets)
Source files stored in Amazon S3 or GCS must meet these file name and format requirements:
- File name must comply with DCR naming requirements
- CSV or GZIP format
- The file underlying GZIP compression must be a CSV file.
- Number of data source files per data folder:
- CSV: Maximum of 1
-
GZIP: Maximum of 1 single-part file. Multi-part GZIP files are supported when named as follows:
filename_part01.gzip
,filename_part02.gzip
, etc.
Create a source
Creating a source involves a guided walkthrough with a few simple steps. To start the process:
- In AppsFlyer, from the side menu, select Collaborate > Data Clean Room.
- From the top-right menu, click + New source.
- Proceed with the New source walkthrough steps:
Step 1: Set source name
Enter the source name. This can be any unique name that will help you identify the source.
Requirements and guidelines
- Make sure the source name is unique among all other sources in your account. Otherwise, you won't be able to save the source.
- For cloud integrations, the name doesn't need to match the file name.
- Source name requirements:
- Length: 2-80 characters
- Valid characters:
- letters
(A-Z, a-z)
- numbers
(0-9)
, cannot be the first character of a name;
- letters
- Invalid characters:
- spaces
- all other symbols or special characters
Step 2: Set source location
To specify the source location:
- Select the connection in which the source will be (or has been) created.
- If there are no connections defined in your account, the New connection dialog will open, prompting you to create one. Follow these instructions to create it.
- If you have existing connections but want to use a new one, click + New connection and follow these instructions to create it.
- Continue with the relevant instructions below, based on where the data for your source is located.
Source locations in BigQuery
To complete specifying the source location for BigQuery sources:
- Select the dataset in which the source table is located.
- Select the table in which the source data is located.
The lists from which you make these selections contain the available datasets and tables, respectively, in the BigQuery project you specified when creating the connection.
Source locations in Snowflake
To complete specifying the source location for Snowflake sources:
- Select the share containing the source data.
- Select the schema in which the source table is located.
- Select the table in which the source data is located.
The lists from which you make these selections contain the shares, schemas, and tables, respectively, in the Snowflake account you specified when creating the connection.
Source locations in cloud storage buckets
Source locations in Amazon S3 or GCS consist of the cloud storage bucket specified by the connection and the underlying folder path from which the DCR reads the source file each time it is updated.
Once you have specified the connection, AppsFlyer can automatically generate the required underlying folder path as part of the source creation process.
- Allowing AppsFlyer to generate the folders makes the process easy. However, you can choose to manually create them instead, according to the instructions detailed here.
If AppsFlyer generates the folders, the only additional information required is the name you want to give the source folder. (This is the top-level folder in which you update the source each time your want to use it for running a new report version.) You can also indicate whether you want the source folder to be created underneath a parent folder – often named input.
To complete specifying a source location in a cloud storage bucket, enter the source folder name.
- By default, the displayed source folder name:
- Is based on the name you gave the source. You can change the folder name to meet your needs, so long as it complies with the DCR naming requirements.
- Indicates that it will be generated within a parent folder named input. This folder serves as the parent folder for all sources you upload to the DCR.
- The input folder is not required, and you can remove it or name it something different, so long as it complies with the DCR naming requirements.
- Although this folder is not required, having an input folder (or an equivalent folder of a different name) is considered best practice. It is even more highly recommended when you are using the same cloud storage bucket both for uploading data files (input) and receiving reports (output).
Important!
If you manually created the folder path, make sure the connection and path you enter in the Source location section match the path you manually created.
Local file system
You can also upload source data using a local file. However, this option is not the recommended way to create your source data. It's primarily used for testing purposes so you can get familiarized with the platform functionalities.
Note
Data uploaded from a local file is automatically removed after being stored for a maximum of 7 days.
Step 3: Configure source structure
Prepare and organize the source data, test it, and save the configured data:
- Load the source fields
- Configure the loaded fields
- Verify EU user inclusion
- Test the source data
- Save the source
1. Load the source fields
Use the instructions below according to the source location:
Data warehouse sources
To load fields from a source located in a data warehouse (BigQuery or Snowflake), click Load fields from source.
Important!
If the selected source table does not include the required date and version columns, you will receive an error.
Cloud storage bucket sources
To load fields from a source located in a cloud storage bucket (Amazon S3 or GCS), you must upload a prototype source file.
For purposes of defining the source structure:
- You can upload a prototype version of the source from a local file.
- If you select this option, AppsFlyer always creates the source folder path automatically.
- or -
- You can upload a prototype version of the source file directly from its connection.
- If you select this option, there's one additional choice to make:
- Allow AppsFlyer to automatically create the source folder structure; or
- Create the source folder structure manually
- If you select this option, there's one additional choice to make:
To upload your prototype source file, follow the instructions in the relevant tab below:
- In the Source structure section, click Load fields from source.
- In the window that opens, select Upload a local file.
- Specify the CSV or GZIP file you want to upload, then click OK.
To load the file from its connection and allow AppsFlyer to create the source folder structure:
- In the Source structure section, click Load fields from source.
- In the window that opens, select Load from connection.
- Click the Generate folders link.
- AppsFlyer automatically creates the required folder structure and source folder (on the connection you specified, with the source folder name you specified).
- After the source folder structure has been created, a confirmation message is displayed, including a link to the source folder. Click the provided link to upload your prototype file to the source folder.
- Once the file has finished uploading, click OK.
To upload the source file from a structure you created manually:
- In the Source structure section, click Load fields from source.
- In the window that opens, select Load from connection.
- DO NOT click Generate folders. Instead, upload the file directly to the source folder you created for it.
- Once the file has finished uploading, click OK.
2. Configure the source fields
After loading the source fields, each field (column) is presented with a field type. Review each field and match it with the appropriate data type from the drop-down list beside it. Consider the following:
Considerations
- When both parties share their source data, at least one field must be set as an identifier to enable user-level data to match across the corresponding sources. An identifier is a field that uniquely identifies an app user (for example, CUID, AppsFlyer ID, or hashed email).
- Although configuring each of the uploaded source fields (columns) isn't mandatory, it is important for categorization, effective data interpretation, aiding audience creation, suggesting insights, and effectively facilitating validations.
To remove a field:
- Hover over the right side of the field you want to remove and click the dustbin icon that appears when hovering.
To add fields manually:
This option allows you to include a field in the audience that isn't currently present in the data source.
- Click + New field. An empty field is added.
- Enter the name of the field and select its type.
Reload source fields
If the configuration of one of your data files has changed, you can update the source file to reflect the changes.
Note
Reloading the source resets the column names in the source to match the updated file names. This will overwrite any of the field names in the list and their type.
To reload updated fields from a file:
- Click Reload fields.
- Select the file location.
- For a local file: Upload the file.
- For a file from your cloud service: Click Load from cloud bucket and follow the instructions.
- Click OK. The updated files are now displayed.
3. Verify EU user inclusion
- Select Yes or No to the question: Does your source include European users, to which EU DMA regulations apply?
Learn more about privacy regulations for the EU Digital Markets Act.
4. Test the source data
- [Optional] Click Test to check for errors in the format or validity of the source fields.
5. Save the source
- Click Save to save the source.
After confirming, the new source is added under the Sources tab.
Note
If you uploaded the source from a local file, saving the source triggers the automatic creation of the folder structure, and the displayed confirmation message includes a link to the source folder.
Manage your sources
The sources you've created are displayed in the Sources tab. From here, you can edit the source name and structure, share it with a collaborator, and delete it—provided it's not already being used in an audience.
Edit a source
- Go to the Sources tab of the Data Clean Room.
- In the list of sources, hover over the source you want to edit, and click the edit icon at the end of the row.
- On the Edit source page, edit the relevant fields, as detailed below.
- Click Save.
Edit the source name
When editing the source name, make sure to follow these naming requirements.
Edit the source location
- From the Edit source page > Source location, select a different data connection.
- Select the relevant location details.
- [Optional] Test the source.
- Click Save.
Edit the source structure
- Go to the field name and type and make the necessary changes: Change the field name or update its type.
- Click Save.
Important!
Don't forget to make corresponding changes reflecting the new source structure in any reports for which this source is used:
- Fields that were removed, uncategorized, or changed from their previous categories are automatically removed from any reports in which they are used.
- Newly added or categorized fields are not automatically included in existing reports until you edit report definitions to include them.
Delete a source
You can delete any source except when it's being used in an audience. If this is the case, a notification will specify the audiences using that source. To enable the deletion of the source, you must first delete the audience linked to it. As for sources shared with you, they can only be deleted by the source owner.
- Go to the Sources tab of the Data Clean Room.
- In the list of sources, hover over the row of the source you want to delete.
- Click the delete icon showing on the right side of the row.
- In the dialog, click Delete to confirm.
Share a source
To share your source with the collaborator and provide them permissions:
- Go to the Sources tab of the Data Clean Room.
- From the list of sources, hover over the source you want to share, and click the sharing iconat the end of the row.
- Enter the collaborator's email address, and click Next.
- Select the relevant sharing permissions, as detailed below.
- Click Save & send.
Reference
Manually creating a storage bucket folder structure (relevant only if you choose to do so)
In general, it's easiest to allow AppsFlyer to automatically generate the required folder structure as part of the source creation process. However, if you wish to create these folders manually, you can do so as follows.
Create a DCR key folder
To ensure maximum security, the folder directly beneath the bucket (the "DCR key folder") must be named with the 8-character, alphanumeric DCR key assigned to your account (for example, 01bcc5fb
). Note that this is different from any other password or key associated with your AppsFlyer account.
The DCR key folder is generally created manually using the interface of your selected cloud service.
To get your account's DCR key:
- Click DCR key at the top of the main DCR page.
After creating the DCR key folder, your bucket/folder structure would look something like this:
Top-level input folder
Though it is not required, best practice is to create a top-level input folder directly beneath the DCR key folder. This folder will be dedicated to files you upload to the DCR.
The top-level input folder is generally created manually using the interface of your selected cloud service.
- This practice is even more highly recommended when you are using the same bucket both for uploading data files (input) and receiving reports (output).
- You can name this folder anything you want, so long as it complies with the DCR naming requirements. For ease of identification, it is usually named
input/
.
After creating the top-level input folder, your bucket/folder structure might look something like this:
Second-level folder for each data source
You can regularly upload different data source files to the DCR for processing. Each of these data sources must be assigned a separate folder ("data source folders").
So, for example, if you plan to upload 2 files to the DCR for processing every day: BI-data.csv and CRM-data.gzip, you would assign each of these data sources a folder. You could choose to call these folders BI-data/
and CRM-data/
.
The data source folders are generally created manually using the interface of your selected cloud service.
After creating 2 data source folders, your bucket/folder structure might look something like this:
Under each data source folder, nested subfolders by date and version must be created each time the source is updated.
Sharing permissions
Grant the collaborator permission to use your source data. View permissions granted to you by a collaborator on the source they shared with you.
Grant permissions for your source
Turn on any of the following permissions to grant the collaborator to access and use your source data:
Permission | Description |
---|---|
Query data | Query the source data using the DCR Dynamic Query |
Build audience |
Segment and combine datasets to create an audience you would like to target, using the audience builder tool.
|
Partner connections |
Activate the created audience on any of your media partner platforms. Note:
|
Download the created audience as a CSV file or send it to your cloud services. | |
Add expiration |
Set an expiration date for the data sharing:
|
View permissions on collaborator’s source
To view the permissions granted to you by a collaborator on the source they shared with you:
- From the Sources tab of the Data Clean Room, go to the sources that were shared with you. You can use the Shared with me filter at the top of the page.
- Under the Permissions column, hover over the row of the source you want to see the detailed permissions.
Privacy regulations: EU Digital Markets Act
Understanding Google's EU user consent policy and its imlications
As part of Google’s enforcement of the Digital Markets Act (DMA) Google updated its EU user consent policy as of March 6, 2024. As a Google App Attribution Partner, AppsFlyer made the necessary changes to support their policy requirements, while ensuring that advertisers maximize the value from their Google Ads marketing channels.
Note
- Google’s enforcement of the Digital Markets Act (DMA) applies to all platforms.
- Read more about this:
Adding consent fields
When setting up a source intended for audience activation on Google and confirming Yes to the question Does your source involve European users subject to EU DMA regulations? ensure that the additional consent fields in the table below are included in the source file. This enables AppsFlyer to transfer the necessary information to Google during the activation process.
Additional consent fields and their response values:
Field name | Response value | Field explained |
---|---|---|
eea |
true/false | Is the user located in the EEA (European Economic Area), to which the DMA applies? |
ad_personalization |
*true/false | Did the user give Google consent to use their data for personalized advertising? |
ad_user_data |
*true/false | Did the user give consent to send their user data to Google? |
* When “true”, AppsFlyer includes the user identifiers you've sent for users who gave consent. When “false”: AppsFlyer doesn't include them, as they weren't sent to AppsFlyer. |
Potential impact on the audience size
The actual and estimated audience size sent to Google may vary based on the number of users granting or denying consent.