Cost ETL—granular campaign cost in a bucket

At a glance: Cost ETL, part of AppsFlyer Xpend, provides advertisers with campaign cost data having the best granularity available per media source. Data, updated 4 times daily, is delivered to your AWS S3 bucket ready for loading into advertiser BI systems. 

5896_CostETL_graphic.png

Cost ETL principles

Campaign cost data is written:

  • To your bucket for viewing, transferring, and loading cost data into your systems.
  • To the bucket four times per day (data freshness: intraday).
  • For the current day and the previous 6 days (referred to as 7 days in this article), as well as days 14, 29, and 88. The retroactive data allows for updates and corrections in the data reporting. 

Data is provided with guaranteed primary dimensions:

  • Geo: breakdown by country
  • Channel: media source channel, for example, in the case of Google—YouTube

You can customize other dimensions and the metrics included in the reports to your needs.

View sample file. Note: The sample provided is an Excel file; Cost ETL files are sent to your bucket as parquet files. 

Implementation

Report dimensions

  • Reports are for all apps included in Cost ETL, per day, per batch. 
  • Each time data is written to the bucket, meaning 4 times a day, all available data is written, including the history of the previous 6 days, and days 14, 29, and 88. Take this into consideration in your data loading process.
  • Reports structures are detailed in the file fields table. The structures are: 
    • Summary report is less granular (detailed) to enable easier and faster consumption.
    • Dimension reports have a primary dimension that is guaranteed. This means that the dimension is available for all media sources contained in that report. In contrast, if a media source does not provide the primary dimension data, that data is not included in the specific dimension report. Secondary dimensions are included when available. They are not guaranteed.
    • The primary (guaranteed) dimensions available are:
      • Geo: Data grouped by country
      • Channel: Media source channel, for example, YouTube in the case of Google and Instagram in the case of Facebook.
    • You should use the dimensions and metrics that best fit your business needs. This may differ depending on the media source.

Directory and filename structure

  • Data written to the S3 bucket has the directory and file structure described. View sample fileNote: The sample provided is an Excel file; Cost ETL files are sent to your bucket as parquet files. 
  • When Cost ETL completes writing to a directory, a flag is set, by creating a file, success.txt. This is always the most recent timestamp in the directory. 
  • Each time data is written, it includes data for the current day and the previous 6 days (referred to as 7 days in this article), as well as days 14, 29, and 88.
  • The number of folders/files is as follows:
    • Summary: 4 batch folders per day.
      • Each batch folder contains parquet files with 7 days of data.
    • Dimensions: Each guaranteed dimension contains 4 batch folders per day.
      • Each batch folder contains parquet files containing data with numbering starting from 1.

Example directory structure

cost_etl_structure_4.jpg

Directory structure

/<advertiser bucket name>/cost_etl/version/dt=<yyyy-mm-dd>/b=<n>/
<dimension>/<file name>

Variable

Content

advertiser_bucket_name

As defined in the Cost ETL configuration: af-xpend-cost-etl-<af-account-id>-[your bucket name suffix]

cost_etl Always cost_etl 
version Cost ETL version
date

Cost date

Format: yyyy-mm-dd

batch Number 1-4
dimension

Data dimension:

  • geo
  • channel
file Numbered starting from 00001
Directory structure

File name structure 

part-<number>

Example

For the first data pull of June 23, 2020, the directory and file name structure is as follows:

/bucket-name/cost_etl/v1/dt=2020-06-23/b=1/geo/part-00001

File fields

Field

Remarks

Always populated

Format

Included in summary file

date

Date cost incurred reported by the media source

Yes

String yyyy-mm-dd

app_id

App id in the AppsFlyer platform

Yes

String

media_source

Media source responsible for display the ad

Yes

String

agency

Agency responsible for placing the ad

No

String

-

campaign

Component of the advertising hierarchy

No

String

-

campaign_id

Component of the advertising hierarchy

No

String

-

adset

Component of the advertising hierarchy

No

String

-

adset_id

Component of the advertising hierarchy

No

String

-

ad

Component of the advertising hierarchy

No

String

-

ad_id

Component of the advertising hierarchy

No

String

ad_account

  • Ad account from which cost data was pulled
  • Relevant for networks where connection is done using oAuth (logging in with your credentials), for example in Google or Facebook

No

String

-

currency

Currency of advertiser spend as defined for the app in AppsFlyer

Yes

3 character string compliant with ISO-4217

original_currency

Currency of cost as reported by the network before any conversions

Yes

3 character string compliant with ISO-4217

-

timezone

  • Timezone of the data as it is stored in AppsFlyer
  • Sometimes networks report data in a different time zone compared to what is defined in AppsFlyer. This data point helps explain cost discrepancies between AppsFlyer and networks

Yes

String 

-

geo

Dimension in the advertising hierarchy

No

2 character string compliant with ISO 3166

channel

Dimension in the advertising hierarchy

No

String

- 

keywords

Word(s) used by the user for online search

Yes

String 

-

site_id

Publisher ID

No

String 

-

campaign_objective

Component of campaign properties. Learn more

No

String 

-

cost_model

Component of campaign properties. Learn more

No

String 

-

af_cost_model

Cost model mapped and normalized by AppsFlyer. Component of campaign properties. Learn more

No

String 

-

bid_strategy

Component of campaign properties. Learn more

No

String 

-

af_bid_strategy

Bid strategy mapped and normalized by AppsFlyer. Component of campaign properties. Learn more

No

String 

-

bid_amount

Component of campaign properties. Learn more

No

Integer

-

original_bid_amount

Component of campaign properties. Learn more

No

Integer

-

Fields/Dimensions

Field

Remarks

Always populated

Format

Included in summary file

impressions

Counted by AppsFlyer

Yes.

If no value is available for a particular metric, it is populated with 0.

Integer

clicks

Counted by AppsFlyer

Integer

reported_impressions

Counted by the Media source

Integer

reported_clicks

Counted by the Media source

Integer

installs

Counted by AppsFlyer

Integer

reported_conversions

Counted by the Media source

Integer

 

re_engagements

Counted by AppsFlyer

Integer

re_attributions

Counted by AppsFlyer

Integer

cost

Amount of spend (including agency fees where relevant)

Value

original_cost

Cost as reported by the network, in the currency reported by the network before any currency conversion (with agency fees calculated by AppsFlyer added where relevant)

Value 

-

impressions_discrepancy

Difference between those counted by ad network and AppsFlyer

Integer

-

clicks_discrepancy

Difference between those counted by ad network and AppsFlyer

Integer

-

installs_discrepancy

Difference between those counted by ad network and AppsFlyer

Integer

-

fees

Fees an agency charges in addition to the usual ad cost. Counted by AppsFlyer

 

Integer

-

cost_without_fees

Cost minus the agency fee. Counted by AppsFlyer

 

Integer

-

original_cost_without_fees

Original cost as reported by the ad network, without agency fees

 

Integer

-

ctr

Click-through rate. Counted by AppsFlyer

 

 

 

 

No

Integer

-

cvr

Conversion rate. Counted by AppsFlyer

Integer 

-

ecpm

Counted by AppsFlyer

Integer

cpi

Counted by AppsFlyer

Integer

ccvr

Cost per conversion

Integer

cvvr

Completed video view rate. Counted by AppsFlyer

Integer

-

reported_cvr

Reported conversion rate

Integer

ecpc

Counted by AppsFlyer

Integer

-

video_25p_views

Video played 25%. Reported by  ad network

Integer

video_50p_views

Video played 50%. Reported by  ad network

Integer

video_75p_views

Video played 75%. Reported by  ad network

Integer

video_completions

Reported by  ad network

Integer

Metrics

Set up Cost ETL

Before you start:

  • Setting up Cost ETL consists of setting up your AWS bucket (and giving AppsFlyer permission to write data in it), and setting up Cost ETL in AppsFlyer.
  • You will need both AWS admin privileges and access to the AppsFlyer UI to complete Cost ETL setup.
  • Keep tabs to both AWS and AppsFlyer open during setup. 
  • Note: KMS bucket encryption is not supported by AppsFlyer. 

To set up your AWS bucket and Cost ETL:

  1. Sign in to the AWS console.
  2. Go to the S3 service.
  3. To create the bucket:
    1. Click Create bucket.
    2. Complete the Bucket name as follows: Start with the mandatory prefix af-xpend-cost-etl-<af-account-id>- and then add a suffix as free text.
    3. Click Create bucket.
  4. In AppsFlyer, go to Integrations > Cost ETL.
  5. Turn Activate Cost ETL on. 
  6. Go to Amazon S3 settings.

  7. Select your S3 bucket region from the dropdown.
    If your region isn't displayed, contact your CSM. 
  8. Enter your Amazon S3 bucket name.
  9. Click Next.
    The bucket policy code snippet displays.

  10. Copy the bucket policy code snippet and paste it into your AWS settings.
    1. In AWS, Select the bucket you created for Cost ETL. 
    2. Go to the Permissions tab.
    3. In the Bucket policy section, click Edit.
    4. The Bucket policy window opens.
    5. Paste the bucket policy snippet into the window.
  11. In your AppsFlyer Cost ETL settings, click Next.
    The Validate bucket step displays. 

  12. Click Validate.
    Verify that Validation successful displays. 
  13. Click Finish.
  14. Select one or more or all apps.  Select all to automatically include apps you add in the future.
  15. Click Apply
  16. Select at least one Guaranteed dimension: Channel and/or Geo.
  17. Select at least one additional dimension. 

  18. Select at least one metric to be included in the reports. 

  19. Click Apply.

Best practices

Override data

When pulling and analyzing your data, it is recommended to pull data for a specific date and batch, or override all previous data for the days that the current batch provides. Otherwise, you may see the same data repeated.

For example, batch 1 on February 20 contains data for Feb 14-20. But, batches written on February 19 also contained data for Feb 14 to Feb 19. Override the data of the previous days received on February 19 with the data received in the most recent February 20 batch.

Geo versus channel

Not all networks provide data for all dimensions together. Geo and channel data in Facebook are the most common examples. This is why two separate data sets are provided. The geo data set is guaranteed to have geo data and the channel data set is guaranteed to have channel data.

In many cases and for many media sources, the data in the geo and channel sets will be identical. As such, consume one of the data sets (geo or channel), according to what best suits your needs.

Aggregate data

Cost ETL provides flexible and granular data as deep as can be extracted from the ad network. To extract actionable insight from such potentially huge amounts of data, it is recommended to aggregate the data in a way that best suits your business needs. For example, if you need to understand cost data at the campaign and country level, use those dimensions.

Standardization across networks

Not all networks provide data with the same granularity. For example, Facebook does not provide site ID cost data, while Twitter does not provide geo cost data. Be aware of such cases as you aggregate Cost ETL data, and make sure you look at similar data as you compare networks.

Compare data

Cost ETL provides data for all your cost data. Some campaigns provided in Cost ETL do not appear in AF dashboards (for example, campaigns that have not converted yet in AppsFlyer, but have generated installs). To compare the data, find a specific campaign ID in the overview dashboard and compare it to its cost data in Cost ETL. 

Was this article helpful?