...
bucket name: s3://weborama-adloox<publisher_name>.
path: /seed_urls/<yyyymm>/<dd>/<hh>: Par exemple: /seed_urls/202106/08/02/
Frequency: every hour.
File name: input_yyyymm-dd-hh.csv.gz
File format: compressed CSV
File content: 1 url per row.
Note |
---|
Protocol needs to be added in the source file, otherwise urls will be dropped |
E.g:
https://ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059
Output Data:
bucket name: s3://weborama-adloox<publisher_name>
path: contextual/<yyyymm>/<dd>/<hh>: E.g : contextual/202106/08/02/
File name: <yyyymmdd-url-profiles-<contextual_owner>-<target_id>.json.gz
...
{"segments":[{"id":"c3315","score":4},{"id":"c0364","score":4}],"id_type":"MoonFish","lang":"fr","url":"ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059","url_type":"FullUrl"}
{"segments":[{"id":"c3322","score":3},{"id":"c0363","score":3},{"id":"c0002","score":1}],"id_type":"MoonFish","lang":"fr","url":"www.passeportsante.net/fr/Actualites/Dossiers/DossierComplexe.aspx?doc=vegetaux-proteines-le-quinoa","url_type":"FullUrl"}
{"segments":[{"id":"c3657","score":3},{"id":"c3660","score":3}],"id_type":"MoonFish","lang":"es","url":"http://okdiario.com/look/casa-real/cuenta-atras-cita-mas-incomoda-reyes-1185754/fotos/9","url_type":"FullUrl "}FullUrl"}
Frequency: every 2 hours.
...