Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://wiki.extranet.weborama.com/xdevice/start

Screenshot - 2024-12-19T134656.802.pngImage Added

The Weborama XDevice solution has two important parts

  1. xdevice data ingestion

  2. upload support

In order to upload data like idfa instead appnexus id, we need add support to the existing uploader to recognize new options in the current data_transfer integration.

After this, we need xdevice data to be able to translate weborama ids to idfa and upload to the supported dsps.

For now, we collect anything from action on computer browser.

Edit

From Collect Frontend

We can collect data by ThirdParty, Exchange and WAMFactory services. Each service handle custom segments and we intercept this operation to ingest data.

In this example, we use the ThirdParty service ( d.A=tp ).

Code Block
http://wam.solution.weborama.fr/fcgi-bin/dispatch.fcgi?d.A=tp&d.k=wam_segments&d.v=12345&g.ism=1&g.did=12345678987654321234567898765432&g.dty=1&g.xcrm=123456789&d.a=123

Here parameters we use :

  • d.a = 123 for account_id

  • g.xcrm = 123456789 for crm identifier

  • g.ism = 1 if mobile

  • g.did = 12345678987654321234567898765432 for device id

  • g.dty = 1 for Apple device, 2 for Google device (if 0 or not present, we will try to use the User Agent to detect the device family type, however the request may be reject by xdevice daemon)

AFFICHE_W will be created by a translation of device id. It will begin by @ (because it's mobile).

This operation is not bijective (device id → AFFICHE_W).

We will push into RabbitMQ queue :

Code Block
$VAR1 = {
    "weborama":"--0m5@rSKJD-85",
    "meta:device" => [
        1,
        "12345678987654321234567898765432",
        "Mozilla/5.0 (iPhone; CPU iPhone OS 8_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12A366 Safari/600.1.4"
    ],
    "meta:idcrm" => [
        123,
        "123456789"
    ]
};

meta:device contains 3 values + 2 optional : device_type (by default : 0), device_id, user_agent, ip, 2 letter country code.

if the 2 letter country code is present, we will use it in the index “country”. if it is not present, we will use the ip to perform a geolocation query.

meta:idcrm contains 2 values : account_id, id_crm

full example

Code Block
$VAR1 = {
    "weborama":"--0m5@rSKJD-85",
    'meta:device' => [
        1,
        '12345678987654321234567898765432',
        'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12A366 Safari/600.1.4',
        '127.0.0.1',
        'FR',  
    ],
    'meta:idcrm' => [
        123,
        '123456789'
    ]
};

the entry will be rejected if:

  1. we can't auto-detect the mobile id type from user agent in auto-detect mode 2. we can't detect the country from the ip - if it is present.

Edit

From CSV Files

We have a script to read CSV files and push data into RabbitMQ.

This script is : wdx-populate-queue-from-csvfile We can pass these parameters :

  • infile *

  • fields *

  • account_id

  • device_type

  • max : number of lines pushed to queue

  • separator : separator into csv file

  • debug

  • dry-run

  • is mandatory

  • File with weborama and idcrm

Code Block
____________,f2a41f08a33d882aec55cd3765458d8c
--0m5@rSKJD-85,991ea637768c911b372d688578cd61da
-10ro0dwJa7-29,90e378638058327c640624f384735412
-1HZvzUinwv-34,2f6e599bee6b3b6e61e87d48b3030cf9
-1bMwPVUjDX-82,6b3ed343287a89a56531f76553845c81
-2Ab-h@t05j-52,9bf9f683a47e8b616d668f512aca25d7

Push in RabbitMQ queue:

Code Block
tpeczenyj@aub-daemon-01:~$ wdx-populate-queue-from-csvfile --infile xdevice-weboid-idcrm.in --account_id 123 --fields weborama,idcrm -d 

Debug mode :

Code Block
line 1 : {'weborama' => '____________','meta:idcrm' => ['123','f2a41f08a33d882aec55cd3765458d8c']} fail
line 2 : {'weborama' => '--0m5@rSKJD-','meta:idcrm' => ['123','991ea637768c911b372d688578cd61da']} success
line 3 : {'weborama' => '-10ro0dwJa7-','meta:idcrm' => ['123','90e378638058327c640624f384735412']} success
line 4 : {'weborama' => '-1HZvzUinwv-','meta:idcrm' => ['123','2f6e599bee6b3b6e61e87d48b3030cf9']} success
line 5 : {'weborama' => '-1bMwPVUjDX-','meta:idcrm' => ['123','6b3ed343287a89a56531f76553845c81']} success
line 6 : {'weborama' => '-2Ab-h@t05j-','meta:idcrm' => ['123','9bf9f683a47e8b616d668f512aca25d7']} success

And finally the result :

Code Block
{'summary' => {'fields' => ['weborama','idcrm'],'stats' => {'rejected' => 1,'total' => 6,'elapsed' => 0,'queued' => 5},'infile' => 'xdevice-weboid-idcrm.in'}}

We will have 5 messages into RabbitMQ like this :

Code Block
$VAR1 = {
    "weborama":"--0m5@rSKJD-",
    "meta:idcrm": [123, "991ea637768c911b372d688578cd61da"]
};
  • File with idcrm and device_id

Code Block
991ea637768c911b372d688578cd61da,AEBE52E7-03EE-455A-B3C4-E57283966231
991ea637768c911b372d688578cd61db,AEBE52E703EE455AB3C4E57283966232
991ea637768c911b372d688578cd61dc,AEBE52E7-03EE-455A-B3C4-E57283966233
991ea637768c911b372d688578cd61dd,AEBE52E703EE455A-B3C4-E57283966234
991ea637768c911b372d688578cd61de,AEBE52E7-03EE-455AB3C4E57283966235

Push in RabbitMQ queue:

Code Block
tpeczenyj@aub-daemon-01:~$ wdx-populate-queue-from-csvfile --infile xdevice-idcrm-idfa.in --account_id 123 --device_type 1 --fields idcrm,device_id -d 

Debug mode :

Code Block
line 1 : {'meta:device' => ['1','AEBE52E7-03EE-455A-B3C4-E57283966231'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61da']} success
line 2 : {'meta:device' => ['1','AEBE52E703EE455AB3C4E57283966232'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61db']} success
line 3 : {'meta:device' => ['1','AEBE52E7-03EE-455A-B3C4-E57283966233'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61dc']} success
line 4 : {'meta:device' => ['1','AEBE52E703EE455A-B3C4-E57283966234'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61dd']} success
line 5 : {'meta:device' => ['1','AEBE52E7-03EE-455AB3C4E57283966235'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61de']} success

And finally the result :

Code Block
{'summary' => {'fields' => ['idcrm','device_id'],'stats' => {'rejected' => 0,'total' => 5,'elapsed' => 0,'queued' => 5},'infile' => 'xdevice-idcrm-idfa.in'}}

We will have 5 messages into RabbitMQ like this :

Code Block
$VAR1 = {
    "meta:idcrm": [123, "991ea637768c911b372d688578cd61da"],
    "meta:device" : [1, "AEBE52E7-03EE-455A-B3C4-E57283966231"]
};

Edit

XDevice daemon

A daemon will pick some messages (provide by Collect or CRM File) from this queue to store into CouchBase.

  • meta:idcrm

With account id, we can retrieve : account name and scope. Account name is used as a key. Scope will be used as a namespace.

  • meta:device

If device type = 1 or 2, we affect to idfa or gaid the device id If device type = 0, we try to detect device type from user_agent We validate data for idfa and gaid

And finally we store data into CouchBase.

Examples :

Message 1 :

Code Block
$VAR1 = {
    "weborama":"--0m5@rSKJD-",
    "meta:idcrm": [123, "991ea637768c911b372d688578cd61da"]
};

Create a new entry into CouchBase :

Code Block
UUID1 = {
    'weborama' => [
        {
            'created_at' => 1466675859,
            'id' => '@12345678987'
        }
    ],
    'laredoute' => [
        {
            'created_at' => 1466675859,
            'id' => '991ea637768c911b372d688578cd61da'
        }
    ]
};

Message 2 :

Code Block
$VAR1 = {
    "meta:idcrm": [123, "991ea637768c911b372d688578cd61da"],
    "meta:device" : [1, "AEBE52E7-03EE-455A-B3C4-E57283966231"]
};

Merge the past entry into CouchBase :

Code Block
UUID1 = {
    'weborama' => [
        {
            'created_at' => 1466675859,
            'id' => '@12345678987'
        }
    ],
    'idfa' => [
        {
            'created_at' => 1466723456,
            'id' => 'AEBE52E703EE455AB3C4E57283966231'
        }
    ],
    'laredoute' => [
        {
            'created_at' => 1466723456,
            'id' => '991ea637768c911b372d688578cd61da'
        }
    ]
};

How to start this daemon ?

Code Block
wdx-xdevice-daemon --pidfile /var/run/weborama-daemon/wdx-xdevice-daemon.pid --max_count=32 --sleep_time=30 --bulk=10000

Edit

How to Search ?

We can use this script to search into CouchBase :

Code Block
$ $HOME/workspace/weborama-crossdevice/bin/xdevice-search.pl -p weborama --id='--0m5@rSKJF-' -s 'pool'

Parameters :

  • p : provider

  • i : identifier

  • s : scope (to find the scope, you can check the accounts.xdevice_conf_id value and then the scope column in the table xdevice_confs)

It will return:

Code Block
$VAR1 = {
          'weborama' => [
                          {
                            'created_at' => 1478101127,
                            'id' => '--0m5@rSKJF-'
                          }
                        ],
          'idfa' => [
                      {
                        'created_at' => 1478101127,
                        'id' => 'AEBE52E703EE455AB3C4E57283966233'
                      }
                    ]
        };

Edit

From CRM Files (first version)

[Deprecated - waiting p2p of RabbitMQ queue to be deleted]

the current version of xdevice support deterministic files with this kind of format

example: webo.axa.data

Code Block
weborama|axa_idcrm
____________|2334f2229795dee8ae6625076c22fee2
____________|8c6ab5b8299b2b9fb75597e7219dc6ce
____________|92f30cf04442e3ac67bee70a076b96ae
____________|f2a41f08a33d882aec55cd3765458d8c
____________|f58b6c11e7083ec5a5068c7e4bc8743c
--0m5@rSKJD-85|991ea637768c911b372d688578cd61da
-10ro0dwJa7-29|90e378638058327c640624f384735412
-1HZvzUinwv-34|2f6e599bee6b3b6e61e87d48b3030cf9
-1bMwPVUjDX-82|6b3ed343287a89a56531f76553845c81
-2Ab-h@t05j-52|9bf9f683a47e8b616d668f512aca25d7

This is the most basic kind of file: a csv using | as separator with two fields: weborama and axa_idcrm, but can be any kind of data.

The header of this file contains the 'provider', and the rest of lines are 'identifiers'. For example, '–0m5@rSKJD-85' is one AFFICHE_W cookie who will be validate and coerced to one valid weborama id.

Important: we need identify the provider of the data, using some label. this label can contain one suffix _id but it is not mandatory.

For the current uploader integration, we need two specific providers: weborama and idfa. The rest is free to be named but should be unique and do not change.

example idfa.axa.dat

Code Block
idfa|axa_idcrm
AEBE52E7-03EE-455A-B3C4-E57283966239|991ea637768c911b372d688578cd61da

When we ingest this two files. will be possible link the weborama id '–0m5@rSKJD-' ( we remove the last two chars, coerce to real webo id ) to idfa AEBE52E7-03EE-455A-B3C4-E57283966239 using the axa_idcrm 991ea637768c911b372d688578cd61da as pivot.

Example of data ingestion:

Code Block
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i webo.axa.dat --use_header -m compute -d 
opening csv webo.axa.dat
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
inserted 6 with success
inserted 7 with success
inserted 8 with success
inserted 9 with success
inserted 10 with success
statistics for file webo.axa.dat
total : 10
valid : 5
stored : 5
stored : 100.00 % of valid lines
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i idfa.axa.dat --use_header -m compute -d 
opening csv idfa.axa.dat
inserted 1 with success
statistics for file idfa.axa.dat
total : 1
valid : 1
stored : 1
stored : 100.00 % of valid lines

To check data ingestion:

Code Block
tpeczenyj@aub-daemon-01:~$ xdevice-search.pl -p weborama -i '--0m5@rSKJD-'
$VAR1 = {
          'weborama' => [
                          {
                            'created_at' => 1466675859,
                            'id' => '--0m5@rSKJD-'
                          }
                        ],
          '4VRu5o09WHamwg4YxsEv0g' => [
                                        {
                                          'created_at' => 1466675868,
                                          'id' => '991ea637768c911b372d688578cd61da'
                                        }
                                      ],
          'idfa' => [
                      {
                        'created_at' => 1466675868,
                        'id' => 'AEBE52E7-03EE-455A-B3C4-E57283966239'
                      }
                    ]
        };

you can note the provider '4VRu5o09WHamwg4YxsEv0g', it is a hash of axa_idcrm and it is mandatory to store data in production couchbase ( located outside weborama ).

To ingest data from a different provider ( like La Poste ) we need use one extra parameters called to_hash

example:

Code Block
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i webo.laposte.dat --use_header -m compute --to_hash laposte_idcrm
...

To remember:

  1. The script xdevice-load-file.pl came from the latest version of Weborama::CrossDevice and it is sensible to the WEBO_ENV var.

  2. The script was made thinking in process axa data. To process general format of files we can use the command line options (see below ).

  3. The option config mode is mandatory in the current version. You must specify 'compute' or the read_oly configuration will be used by default and nothing will be processed.

  4. If the file does not have a header, please specify by –fields a,b,c

  5. If the file has a header but it is wrong, you can specify the fields + ask to skip_first_line

  6. By default we will not use the header. Please force it with –use_header option

  7. The Debug option is important to understand what is happening, if we need show mode info, please open a redmine ticket

Code Block
USAGE: xdevice-load-file.pl [-dhims] [long options...]

    -m --config_mode=String  config mode ex: read_only, test or compute
    -d --debug               if true, will show debug messages
    --fields=[Strings]       input fields in order (default, weborama,
                             axa_idcrm )
    --hash                   if false, will ignore to_hash option
    -i --infile=String       no doc for infile
    -s --separator=String    should contains the separator char. default is
                             '|'
    --skip_first_line        if true and we are not using use_header, will
                             ignore the first line. useful when there is
                             some header and we need ignore and force with
                             --fields
    --suffix=String          suffix extension to add when finish process
                             file. default is "processed"
    --to_hash=[Strings]      input fields to hash ( convert axa_idcrm to
                             4VRu5o09WHamwg4YxsEv0g )
    --use_header             if true, will ignore the fields list and read
                             it from the first line of the file
                                                                        
    --usage                  show a short help message
    -h                       show a compact help message
    --help                   show a long help message
    --man                    show the manual

Edit

Uploader Integration

XDevice is generic. The focus now is use idfa instead appnexus id and to do this we need

  1. Create one data_transfer with delivering_type 'mobile' ( the default is cookie )

  2. The uploader, by default, will process only data_transfers with cookies

  3. There was one special destination, XDeviceAppNexus who can process data_transfers with mobile.

  4. Without any data_transfer, this destination will just skip data

  5. If there is any data, the uploader will: translate segments and push to rabbitmq, in the queue mobile-mobile_appnexus_add on vhost wam the pairs “weborama_id”, “segments”

  6. There is one new daemon will consume this queue, translate webo id to idfa and upload. This should run in google cloud `daemon-euw1-zb-00.cross-device.gce.out.weborama.fr`

Edit

Daemon

on google cloud, start/stop the monit conf `wdx-uploader-appnexus-add-xdevice`

this will start the daemon `wdx-uploader-daemon` with section `xdevice_appnexus`

Code Block
wdx-uploader-daemon --section=xdevice_appnexus --pidfile=/var/run/weborama-daemon/wdx-uploader-appnexus-add-xdevice/wdx-uploader-appnexus-add-xdevice.pid

Logs

The worker in google should write on /var/log/daemon.log

one line like this:

Code Block
AppNexus job_id='hpKaCHfhyAInlTuWyK65hjGGTZ7Imi1466677881'

means we upload data to appnexus and this is the id of the job. To check te status of this job please do this ( please consider the WEBO_ENV and use the real job_id ):

Code Block
perl -MData::Dumper -Maliased=Weborama::DSP::AppNexus::Data  -E 'say(Dumper(Data->new->check_job_status("hpKaCHfhyAInlTuWyK65hjGGTZ7Imi1466677881")));'

The uploader should print something like this on aub-daemon-01

Code Block
Jun 23 10:40:55 aub-daemon-01 wtd-uploader.pl[43737]: [XDeviceAppNexus|add] Uploader status for 'x-appnexus': 'SUCCESS' [uploaded=0, skipped: unknown_id=0, no_mapping=50079, reject=0, failures=0]
Jun 23 10:40:55 aub-daemon-01 wtd-uploader.pl[43737]: [XDeviceAppNexus|add] {'statistics for x-appnexus' => {}}

This means there is no data to upload.

Edit

ELK Metrics

uplaoder send elk metrics. the worker in google cloud no.

Edit

HLL Metrics

the package Weborama::DataExchange::Tools expose this script:

Code Block
tpeczenyj@aub-daemon-01:~/xdevice$ wdx-hllmetrics.pl --dsp_all --today
W::HLLMetrics from 2016-06-23T00:00:00 to 2016-06-24T00:00:00 env prod
------------------------------------------------------------
        dsp_name all_uploaded no_ext_id   unmapped  reject failure      total
        ...
        appnexus    8,904,132 7,755,653 17,229,834       0       0 33,889,619
               %      26.27 %   22.89 %    50.84 %  0.00 %  0.00 %           
        ...
xdevice-appnexus            0         0 33,889,619       0       0 33,889,619
               %       0.00 %    0.00 %   100.00 %  0.00 %  0.00 %           

each successful upload in the worker will increase the all_uploaded on xdevice-appnexus, each user skipped by no idfa will increase the no_ext_id. In case of any failure, we will increase the failure counter.

total is global. unmapped is a basic calculation from total - sum of all other fields. this save space in hll metrics.

Edit

Other info

  • In case of any trouble in the uploader, consider stop this specific destination in the command line

  • Remove will be supported in one next version

  • AlexisD is working on the monit conf of the xdevice uploader worker in the cloud

  • AppNexus only allow us upload once per minute. We will store files in the cloud in /data/uploader/appnexus until we can upload ( using the distributed lock )

  • The data ingestion should be done inside weborama and store in Couchbase (located in the cloud)

  • Bruno should provide the La Poste files to ingest