...
https://wiki.extranet.weborama.com/xdevice/start
The Weborama XDevice solution has two important parts
xdevice data ingestion
upload support
In order to upload data like idfa instead appnexus id, we need add support to the existing uploader to recognize new options in the current data_transfer integration.
After this, we need xdevice data to be able to translate weborama ids to idfa and upload to the supported dsps.
For now, we collect anything from action on computer browser.
Edit
From Collect Frontend
We can collect data by ThirdParty, Exchange and WAMFactory services. Each service handle custom segments and we intercept this operation to ingest data.
In this example, we use the ThirdParty service ( d.A=tp ).
Code Block |
---|
http://wam.solution.weborama.fr/fcgi-bin/dispatch.fcgi?d.A=tp&d.k=wam_segments&d.v=12345&g.ism=1&g.did=12345678987654321234567898765432&g.dty=1&g.xcrm=123456789&d.a=123 |
Here parameters we use :
d.a = 123 for account_id
g.xcrm = 123456789 for crm identifier
g.ism = 1 if mobile
g.did = 12345678987654321234567898765432 for device id
g.dty = 1 for Apple device, 2 for Google device (if 0 or not present, we will try to use the User Agent to detect the device family type, however the request may be reject by xdevice daemon)
AFFICHE_W will be created by a translation of device id. It will begin by @ (because it's mobile).
This operation is not bijective (device id → AFFICHE_W).
We will push into RabbitMQ queue :
Code Block |
---|
$VAR1 = {
"weborama":"--0m5@rSKJD-85",
"meta:device" => [
1,
"12345678987654321234567898765432",
"Mozilla/5.0 (iPhone; CPU iPhone OS 8_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12A366 Safari/600.1.4"
],
"meta:idcrm" => [
123,
"123456789"
]
}; |
meta:device contains 3 values + 2 optional : device_type (by default : 0), device_id, user_agent, ip, 2 letter country code.
if the 2 letter country code is present, we will use it in the index “country”. if it is not present, we will use the ip to perform a geolocation query.
meta:idcrm contains 2 values : account_id, id_crm
full example
Code Block |
---|
$VAR1 = {
"weborama":"--0m5@rSKJD-85",
'meta:device' => [
1,
'12345678987654321234567898765432',
'Mozilla/5.0 (iPhone; CPU iPhone OS 8_0_2 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12A366 Safari/600.1.4',
'127.0.0.1',
'FR',
],
'meta:idcrm' => [
123,
'123456789'
]
}; |
the entry will be rejected if:
we can't auto-detect the mobile id type from user agent in auto-detect mode 2. we can't detect the country from the ip - if it is present.
Edit
From CSV Files
We have a script to read CSV files and push data into RabbitMQ.
This script is : wdx-populate-queue-from-csvfile We can pass these parameters :
infile *
fields *
account_id
device_type
max : number of lines pushed to queue
separator : separator into csv file
debug
dry-run
is mandatory
File with weborama and idcrm
Code Block |
---|
____________,f2a41f08a33d882aec55cd3765458d8c
--0m5@rSKJD-85,991ea637768c911b372d688578cd61da
-10ro0dwJa7-29,90e378638058327c640624f384735412
-1HZvzUinwv-34,2f6e599bee6b3b6e61e87d48b3030cf9
-1bMwPVUjDX-82,6b3ed343287a89a56531f76553845c81
-2Ab-h@t05j-52,9bf9f683a47e8b616d668f512aca25d7 |
Push in RabbitMQ queue:
Code Block |
---|
tpeczenyj@aub-daemon-01:~$ wdx-populate-queue-from-csvfile --infile xdevice-weboid-idcrm.in --account_id 123 --fields weborama,idcrm -d |
Debug mode :
Code Block |
---|
line 1 : {'weborama' => '____________','meta:idcrm' => ['123','f2a41f08a33d882aec55cd3765458d8c']} fail
line 2 : {'weborama' => '--0m5@rSKJD-','meta:idcrm' => ['123','991ea637768c911b372d688578cd61da']} success
line 3 : {'weborama' => '-10ro0dwJa7-','meta:idcrm' => ['123','90e378638058327c640624f384735412']} success
line 4 : {'weborama' => '-1HZvzUinwv-','meta:idcrm' => ['123','2f6e599bee6b3b6e61e87d48b3030cf9']} success
line 5 : {'weborama' => '-1bMwPVUjDX-','meta:idcrm' => ['123','6b3ed343287a89a56531f76553845c81']} success
line 6 : {'weborama' => '-2Ab-h@t05j-','meta:idcrm' => ['123','9bf9f683a47e8b616d668f512aca25d7']} success |
And finally the result :
Code Block |
---|
{'summary' => {'fields' => ['weborama','idcrm'],'stats' => {'rejected' => 1,'total' => 6,'elapsed' => 0,'queued' => 5},'infile' => 'xdevice-weboid-idcrm.in'}} |
We will have 5 messages into RabbitMQ like this :
Code Block |
---|
$VAR1 = {
"weborama":"--0m5@rSKJD-",
"meta:idcrm": [123, "991ea637768c911b372d688578cd61da"]
}; |
File with idcrm and device_id
Code Block |
---|
991ea637768c911b372d688578cd61da,AEBE52E7-03EE-455A-B3C4-E57283966231
991ea637768c911b372d688578cd61db,AEBE52E703EE455AB3C4E57283966232
991ea637768c911b372d688578cd61dc,AEBE52E7-03EE-455A-B3C4-E57283966233
991ea637768c911b372d688578cd61dd,AEBE52E703EE455A-B3C4-E57283966234
991ea637768c911b372d688578cd61de,AEBE52E7-03EE-455AB3C4E57283966235 |
Push in RabbitMQ queue:
Code Block |
---|
tpeczenyj@aub-daemon-01:~$ wdx-populate-queue-from-csvfile --infile xdevice-idcrm-idfa.in --account_id 123 --device_type 1 --fields idcrm,device_id -d |
Debug mode :
Code Block |
---|
line 1 : {'meta:device' => ['1','AEBE52E7-03EE-455A-B3C4-E57283966231'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61da']} success
line 2 : {'meta:device' => ['1','AEBE52E703EE455AB3C4E57283966232'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61db']} success
line 3 : {'meta:device' => ['1','AEBE52E7-03EE-455A-B3C4-E57283966233'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61dc']} success
line 4 : {'meta:device' => ['1','AEBE52E703EE455A-B3C4-E57283966234'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61dd']} success
line 5 : {'meta:device' => ['1','AEBE52E7-03EE-455AB3C4E57283966235'],'meta:idcrm' => ['123','991ea637768c911b372d688578cd61de']} success |
And finally the result :
Code Block |
---|
{'summary' => {'fields' => ['idcrm','device_id'],'stats' => {'rejected' => 0,'total' => 5,'elapsed' => 0,'queued' => 5},'infile' => 'xdevice-idcrm-idfa.in'}} |
We will have 5 messages into RabbitMQ like this :
Code Block |
---|
$VAR1 = {
"meta:idcrm": [123, "991ea637768c911b372d688578cd61da"],
"meta:device" : [1, "AEBE52E7-03EE-455A-B3C4-E57283966231"]
}; |
Edit
XDevice daemon
A daemon will pick some messages (provide by Collect or CRM File) from this queue to store into CouchBase.
meta:idcrm
With account id, we can retrieve : account name and scope. Account name is used as a key. Scope will be used as a namespace.
meta:device
If device type = 1 or 2, we affect to idfa or gaid the device id If device type = 0, we try to detect device type from user_agent We validate data for idfa and gaid
And finally we store data into CouchBase.
Examples :
Message 1 :
Code Block |
---|
$VAR1 = {
"weborama":"--0m5@rSKJD-",
"meta:idcrm": [123, "991ea637768c911b372d688578cd61da"]
}; |
Create a new entry into CouchBase :
Code Block |
---|
UUID1 = {
'weborama' => [
{
'created_at' => 1466675859,
'id' => '@12345678987'
}
],
'laredoute' => [
{
'created_at' => 1466675859,
'id' => '991ea637768c911b372d688578cd61da'
}
]
}; |
Message 2 :
Code Block |
---|
$VAR1 = {
"meta:idcrm": [123, "991ea637768c911b372d688578cd61da"],
"meta:device" : [1, "AEBE52E7-03EE-455A-B3C4-E57283966231"]
}; |
Merge the past entry into CouchBase :
Code Block |
---|
UUID1 = {
'weborama' => [
{
'created_at' => 1466675859,
'id' => '@12345678987'
}
],
'idfa' => [
{
'created_at' => 1466723456,
'id' => 'AEBE52E703EE455AB3C4E57283966231'
}
],
'laredoute' => [
{
'created_at' => 1466723456,
'id' => '991ea637768c911b372d688578cd61da'
}
]
}; |
How to start this daemon ?
Code Block |
---|
wdx-xdevice-daemon --pidfile /var/run/weborama-daemon/wdx-xdevice-daemon.pid --max_count=32 --sleep_time=30 --bulk=10000 |
Edit
How to Search ?
We can use this script to search into CouchBase :
Code Block |
---|
$ $HOME/workspace/weborama-crossdevice/bin/xdevice-search.pl -p weborama --id='--0m5@rSKJF-' -s 'pool' |
Parameters :
p : provider
i : identifier
s : scope (to find the scope, you can check the accounts.xdevice_conf_id value and then the scope column in the table xdevice_confs)
It will return:
Code Block |
---|
$VAR1 = {
'weborama' => [
{
'created_at' => 1478101127,
'id' => '--0m5@rSKJF-'
}
],
'idfa' => [
{
'created_at' => 1478101127,
'id' => 'AEBE52E703EE455AB3C4E57283966233'
}
]
};
|
Edit
From CRM Files (first version)
[Deprecated - waiting p2p of RabbitMQ queue to be deleted]
the current version of xdevice support deterministic files with this kind of format
example: webo.axa.data
Code Block |
---|
weborama|axa_idcrm
____________|2334f2229795dee8ae6625076c22fee2
____________|8c6ab5b8299b2b9fb75597e7219dc6ce
____________|92f30cf04442e3ac67bee70a076b96ae
____________|f2a41f08a33d882aec55cd3765458d8c
____________|f58b6c11e7083ec5a5068c7e4bc8743c
--0m5@rSKJD-85|991ea637768c911b372d688578cd61da
-10ro0dwJa7-29|90e378638058327c640624f384735412
-1HZvzUinwv-34|2f6e599bee6b3b6e61e87d48b3030cf9
-1bMwPVUjDX-82|6b3ed343287a89a56531f76553845c81
-2Ab-h@t05j-52|9bf9f683a47e8b616d668f512aca25d7 |
This is the most basic kind of file: a csv using | as separator with two fields: weborama and axa_idcrm, but can be any kind of data.
The header of this file contains the 'provider', and the rest of lines are 'identifiers'. For example, '–0m5@rSKJD-85' is one AFFICHE_W cookie who will be validate and coerced to one valid weborama id.
Important: we need identify the provider of the data, using some label. this label can contain one suffix _id but it is not mandatory.
For the current uploader integration, we need two specific providers: weborama and idfa. The rest is free to be named but should be unique and do not change.
example idfa.axa.dat
Code Block |
---|
idfa|axa_idcrm
AEBE52E7-03EE-455A-B3C4-E57283966239|991ea637768c911b372d688578cd61da |
When we ingest this two files. will be possible link the weborama id '–0m5@rSKJD-' ( we remove the last two chars, coerce to real webo id ) to idfa AEBE52E7-03EE-455A-B3C4-E57283966239 using the axa_idcrm 991ea637768c911b372d688578cd61da as pivot.
Example of data ingestion:
Code Block |
---|
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i webo.axa.dat --use_header -m compute -d
opening csv webo.axa.dat
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
weboid ____________ is not real at /usr/local/perl-dists/perls/perl-5.12.2/bin/xdevice-load-file.pl line 18.
inserted 6 with success
inserted 7 with success
inserted 8 with success
inserted 9 with success
inserted 10 with success
statistics for file webo.axa.dat
total : 10
valid : 5
stored : 5
stored : 100.00 % of valid lines
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i idfa.axa.dat --use_header -m compute -d
opening csv idfa.axa.dat
inserted 1 with success
statistics for file idfa.axa.dat
total : 1
valid : 1
stored : 1
stored : 100.00 % of valid lines |
To check data ingestion:
Code Block |
---|
tpeczenyj@aub-daemon-01:~$ xdevice-search.pl -p weborama -i '--0m5@rSKJD-'
$VAR1 = {
'weborama' => [
{
'created_at' => 1466675859,
'id' => '--0m5@rSKJD-'
}
],
'4VRu5o09WHamwg4YxsEv0g' => [
{
'created_at' => 1466675868,
'id' => '991ea637768c911b372d688578cd61da'
}
],
'idfa' => [
{
'created_at' => 1466675868,
'id' => 'AEBE52E7-03EE-455A-B3C4-E57283966239'
}
]
};
|
you can note the provider '4VRu5o09WHamwg4YxsEv0g', it is a hash of axa_idcrm and it is mandatory to store data in production couchbase ( located outside weborama ).
To ingest data from a different provider ( like La Poste ) we need use one extra parameters called to_hash
example:
Code Block |
---|
tpeczenyj@aub-daemon-01:~$ xdevice-load-file.pl -i webo.laposte.dat --use_header -m compute --to_hash laposte_idcrm
... |
To remember:
The script xdevice-load-file.pl came from the latest version of Weborama::CrossDevice and it is sensible to the WEBO_ENV var.
The script was made thinking in process axa data. To process general format of files we can use the command line options (see below ).
The option config mode is mandatory in the current version. You must specify 'compute' or the read_oly configuration will be used by default and nothing will be processed.
If the file does not have a header, please specify by –fields a,b,c
If the file has a header but it is wrong, you can specify the fields + ask to skip_first_line
By default we will not use the header. Please force it with –use_header option
The Debug option is important to understand what is happening, if we need show mode info, please open a redmine ticket
Code Block |
---|
USAGE: xdevice-load-file.pl [-dhims] [long options...]
-m --config_mode=String config mode ex: read_only, test or compute
-d --debug if true, will show debug messages
--fields=[Strings] input fields in order (default, weborama,
axa_idcrm )
--hash if false, will ignore to_hash option
-i --infile=String no doc for infile
-s --separator=String should contains the separator char. default is
'|'
--skip_first_line if true and we are not using use_header, will
ignore the first line. useful when there is
some header and we need ignore and force with
--fields
--suffix=String suffix extension to add when finish process
file. default is "processed"
--to_hash=[Strings] input fields to hash ( convert axa_idcrm to
4VRu5o09WHamwg4YxsEv0g )
--use_header if true, will ignore the fields list and read
it from the first line of the file
--usage show a short help message
-h show a compact help message
--help show a long help message
--man show the manual |
Edit
Uploader Integration
XDevice is generic. The focus now is use idfa instead appnexus id and to do this we need
Create one data_transfer with delivering_type 'mobile' ( the default is cookie )
The uploader, by default, will process only data_transfers with cookies
There was one special destination, XDeviceAppNexus who can process data_transfers with mobile.
Without any data_transfer, this destination will just skip data
If there is any data, the uploader will: translate segments and push to rabbitmq, in the queue mobile-mobile_appnexus_add on vhost wam the pairs “weborama_id”, “segments”
There is one new daemon will consume this queue, translate webo id to idfa and upload. This should run in google cloud `daemon-euw1-zb-00.cross-device.gce.out.weborama.fr`
Edit
Daemon
on google cloud, start/stop the monit conf `wdx-uploader-appnexus-add-xdevice`
this will start the daemon `wdx-uploader-daemon` with section `xdevice_appnexus`
Code Block |
---|
wdx-uploader-daemon --section=xdevice_appnexus --pidfile=/var/run/weborama-daemon/wdx-uploader-appnexus-add-xdevice/wdx-uploader-appnexus-add-xdevice.pid |
Logs
The worker in google should write on /var/log/daemon.log
one line like this:
Code Block |
---|
AppNexus job_id='hpKaCHfhyAInlTuWyK65hjGGTZ7Imi1466677881' |
means we upload data to appnexus and this is the id of the job. To check te status of this job please do this ( please consider the WEBO_ENV and use the real job_id ):
Code Block |
---|
perl -MData::Dumper -Maliased=Weborama::DSP::AppNexus::Data -E 'say(Dumper(Data->new->check_job_status("hpKaCHfhyAInlTuWyK65hjGGTZ7Imi1466677881")));' |
The uploader should print something like this on aub-daemon-01
Code Block |
---|
Jun 23 10:40:55 aub-daemon-01 wtd-uploader.pl[43737]: [XDeviceAppNexus|add] Uploader status for 'x-appnexus': 'SUCCESS' [uploaded=0, skipped: unknown_id=0, no_mapping=50079, reject=0, failures=0]
Jun 23 10:40:55 aub-daemon-01 wtd-uploader.pl[43737]: [XDeviceAppNexus|add] {'statistics for x-appnexus' => {}} |
This means there is no data to upload.
Edit
ELK Metrics
uplaoder send elk metrics. the worker in google cloud no.
Edit
HLL Metrics
the package Weborama::DataExchange::Tools expose this script:
Code Block |
---|
tpeczenyj@aub-daemon-01:~/xdevice$ wdx-hllmetrics.pl --dsp_all --today
W::HLLMetrics from 2016-06-23T00:00:00 to 2016-06-24T00:00:00 env prod
------------------------------------------------------------
dsp_name all_uploaded no_ext_id unmapped reject failure total
...
appnexus 8,904,132 7,755,653 17,229,834 0 0 33,889,619
% 26.27 % 22.89 % 50.84 % 0.00 % 0.00 %
...
xdevice-appnexus 0 0 33,889,619 0 0 33,889,619
% 0.00 % 0.00 % 100.00 % 0.00 % 0.00 %
|
each successful upload in the worker will increase the all_uploaded on xdevice-appnexus, each user skipped by no idfa will increase the no_ext_id. In case of any failure, we will increase the failure counter.
total is global. unmapped is a basic calculation from total - sum of all other fields. this save space in hll metrics.
Edit
Other info
In case of any trouble in the uploader, consider stop this specific destination in the command line
Remove will be supported in one next version
AlexisD is working on the monit conf of the xdevice uploader worker in the cloud
AppNexus only allow us upload once per minute. We will store files in the cloud in /data/uploader/appnexus until we can upload ( using the distributed lock )
The data ingestion should be done inside weborama and store in Couchbase (located in the cloud)
Bruno should provide the La Poste files to ingest