Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 2 Next »

Input Data:

  • bucket name: s3://weborama-adloox

  • path: /seed_urls/<yyyymm>/<dd>/<hh>: Par exemple: /seed_urls/202106/08/02/

  • Frequency: every hour.

  • File name: input_yyyymm-dd-hh.csv.gz

  • File format: compressed CSV

  • File content: 1 url per row.

E.g:

ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059

www.passeportsante.net/fr/Actualites/Dossiers/DossierComplexe.aspx?doc=vegetaux-proteines-le-quinoa

Output Data:

E.g 2021060908-url-profiles-ax-6.json.gz

  • File format: compressed JSON

  • File content: 1 url profile per row.

{"segments":[{"id":"c3315","score":4},{"id":"c0364","score":4}],"id_type":"MoonFish","lang":"fr","url":"ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059","url_type":"FullUrl"}

{"segments":[{"id":"c3322","score":3},{"id":"c0363","score":3},{"id":"c0002","score":1}],"id_type":"MoonFish","lang":"fr","url":"www.passeportsante.net/fr/Actualites/Dossiers/DossierComplexe.aspx?doc=vegetaux-proteines-le-quinoa","url_type":"FullUrl"}

{"segments":[{"id":"c3657","score":3},{"id":"c3660","score":3}],"id_type":"MoonFish","lang":"es","url":"http://okdiario.com/look/casa-real/cuenta-atras-cita-mas-incomoda-reyes-1185754/fotos/9","url_type":"FullUrl "}FullUrl"}

  • Frequency: every 2 hours.

N.B the output urls might be normalized and will not correspond exactly to seed urls.

  • metadata: In order to map segment IDs with segments names we will provide an hourly metadatafile.

    • path: contextual/metadata/<yyyymm>/<dd>/<hh>

    • File name: metadata.json.gz

    • File Content:

E.g:

{
"c3328": "Makeup",
"c0053": "Asterix - BD",
"c3322": "Personal care",
"c0052": "Parc d'attraction",
"c3287": "Tennis",
"c3324": "Martial arts",
"c3292": "Basketball",
"c3290": "Beauty products",
"c3295": "Baseball",
"c0501": "Lacoste DE",
"c0109": "Fonction publique",
"c3297": "Sports",
"c3335": "Motorcycles and bicycles",
"c3340": "Motor Sport",
"c0357": "SIC",
"c0359": "actualité religieuse",
"c0350": "Opérations navales",
"c0110": "Justice tribunal",
"c0351": "Aeronautique",
"c3226": "Eyewear",
"c0353": "Mécanique et Maintenance",
"c0356": "Restauration",
"c0355": "Soutien",
"c3350": "Jewelry",
"c2306": "Vehicule électrique-Prix",
"c0006": "Concurrents Marionnaud",
"c2307": "Véhicule électrique-Autonomie",
"c0005": "Soin Visage & Corps",
"c0008": "LifeEvent Mariage",
"c3635": "Emission de TV",
"c0007": "Recouvrement",
"c0009": "Moving",
"c3519": "Lacoste ES",
"c0361": "activités seniors",
"c0360": "actualité monde",
"c0363": "Formation",
"c3352": "Loans",
"c0362": "Générique - Emploi",
"c0002": "Beauté",
"c3634": "Moments de vie Immo",
"c0364": "Première expérience",
"c3633": "Achat immobilier",
"c2303": "Golf",
"c0004": "Capillaire",
"c3632": "Concurrents BienIci",
"c0003": "Parfums",
"c3631": "Agences Immobilières",
"c2301": "Gambling",
"c3360": "Luxury",
"c3649": "Golf",
"c3648": "Footwear",
"c3647": "Fashion trend",
"c3409": "Thématiques Milan",
"c3640": "Vente Immobiliere",
"c0011": "Concurrents Bayard",
"c3364": "Hair products and styling",
"c0010": "Intent Credit",
"c3242": "Champions League",
"c0013": "Univers Jeunesse",
"c0012": "Marques Bayard",
"c3248": "Clothing",
"c3367": "Health and Care Products",
"c0014": "Titres phares des revues",
"c0260": "Periscom-Noviacare",
"c0381": "Tourisme GE – Weekend Nature",
"c3370": "High School",
"c3417": "Pièces de musique classique",
"c3659": "Video games",
"c3658": "Health and Care Products",
"c3416": "Compositeurs",
"c3657": "Accessories",
"c3415": "Musique Classique",
"c3652": "Deodorant",
"c0383": "Tourisme GE – Weekend Culture",
"c3377": "Accessories",
"c3256": "American Football",
"c3376": "Horse racing",
"c3651": "Fragrance",
"c0385": "Tourisme GE – Weekend Oeno",
"c3650": "Eyewear",
"c3374": "Horse riding",
"c0384": "Tourisme GE – Weekend BienEtre",
"c3656": "High Fashion",
"c3259": "Business Administration & Management",
"c0144": "Véhicule électrique",
"c3379": "Hunting & Fishing",
"c3378": "Good deals",
"c3653": "Clothing",
"c3384": "Fragrance",
"c3382": "Hiking & Mountaineering",
"c3308": "Rugby",
"c3307": "Outdoor activities",
"c3669": "Fashion trend",
"c3668": "Underwear",
"c3388": "Swimming",
"c3663": "Beauty products",
"c3387": "Sports equipments & Outdoor gear",
"c3662": "Beauty treatments",
"c0393": "Etudiant",
"c3386": "Electrical car",
"c3661": "Personal care",
"c3660": "Jewelry",
"c3385": "Deodorant",
"c3667": "Luxury",
"c3666": "Sports",
"c3665": "Deodorant",
"c3269": "Bodybuilding",
"c3268": "Beauty treatments",
"c3664": "Tennis",
"c3389": "Dance",
"c3391": "Soccer",
"c3270": "Careers and occupational training",
"c3390": "Cycling",
"c3273": "Bicycle",
"c3272": "Video games",
"c0049": "postes dans l'armée",
"c3316": "Real estate",
"c3673": "Total Mobility",
"c3315": "Public administrations",
"c3314": "Running",
"c0047": "berlines",
"c0051": "BMW & marques concurrentes",
"c0050": "emploi jeune"
}

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.