Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Taxonomies

Generic Taxonomy

Weborama generic taxonomy has around 200 behavioral segments clusters covering most of interests. Here is a sample list of these segments clusters:

  • Holidays
  • Auto parts
  • Car Rental
  • Literature
  • News
  • Running
  • Philosophy
  • ….

Custom Taxonomies

Weborama enables to create custom taxonomies for its clients using the taxonomy engine BigFish. Here are some examples of custom taxonomies that you can create via BigFish:

  • Vertical Taxonomies: E.g. Automotive taxonomy
  • Brand Taxonomies: L’Oréal, P&G
  • Competition Taxonomies: UBS vs HSBC, Goldman Sachs

Data Extraction frequency:

          Profiles built upon standard taxonomy or a custom taxonomy are extracted every day to a dedicated S3 per region.

  • Data Format:

          The profiles are extracted as JSON compressed files.

  • Output Data Hierarchy:

For daily extracted profiles. We are using the following data hierarchy:

1) S3 bucket per region:

European profiles should be stored and exported to a european region, US profiles should be stored and exported to US region.

2 ) different types of profiles:  

  1. generic: profiles are built with Weborama standard taxonomy.
  2. extra: profiles are built with a custom taxonomy.

           

3) 1 folder per country:

4 ) 1 folder per taxonomy:

s3://webo-profiles-us/wb/US/gt/

5 ) Partition Data per Year, month, day:

6 ) Profiles for the selected taxonomy:

  • Data Hierarchy: s3://<bucket-name>/<owner>/<country>/<taxonomy_name>/<year>/<month>/<day>/part-*.gz
  • E.g:

e.g: s3://webo-profiles-eu/wb/DE/gt/2017/11/22/part*.gz

      

  • A User Profile data:

Here is the data output format of a sample user profile:

{

"userId": "0000084c-b403-3b4a-b2ed-71d8de0465e4",

"owner": "wb",

"country": "DE",

"latitude": 50.100006103515625,

"longitude": 8.600006103515625,

"date": "20171122",

"qindexes": {

"mapping": {

"c:gt:Major Appliances (White goods)": 10,

"c:gt:Halloween": 5,

"c:gt:Comics": 8,

"c:gt:Comedy": 1,

"c:gt:Law": 17,

"c:gt:Cloud computing": 1,

"c:gt:Hiking \u0026 Mountaineering": 16,

"c:gt:Air Transport": 17,

"c:gt:Tourism": 10,

"c:gt:X": 2,

"c:gt:Healthcare and medicine": 4,

"c:gt:Occult": 17,

"c:gt:Astrology": 12,

"c:gt:Fine dining \u0026 local produce": 11,

"c:gt:Sunny Destination": 9,

"c:gt:Interior design": 4,

"c:gt:Winter Holidays": 6,

"c:gt:Psychotherapy": 16,

"c:gt:Footwear": 1,

"c:gt:Top Business schools": 9,

"c:gt:American Football": 8,

"c:gt:Bicycle": 7,

"c:gt:Catch-up TV": 10,

"c:gt:Computer hardware \u0026 devices": 12,

"c:gt:Fantasy": 6,

"c:gt:Banking": 14,

"c:gt:Accessories": 1,

"c:gt:Gambling": 10,

"c:gt:Doctor": 9,

"c:gt:Pharmacy": 6,

"c:gt:Car brands": 1,

"c:gt:ISP \u0026 Browsers": 13,

"c:gt:Business Administration \u0026 Management": 14,

"c:gt:Insurance": 1,

"c:gt:Energy": 15,

"c:gt:Auto parts": 1,

"c:gt:Fashion trend": 7,

"c:gt:Holiday rentals": 11,

"c:gt:Architect": 1,

"c:gt:Merchant": 11,

"c:gt:Entrepreneur": 16,

"c:gt:Holidays": 16,

"c:gt:Diet and nutrition": 1,

"c:gt:DIY": 11,

"c:gt:Public administrations": 11,

"c:gt:Clothing": 5,

"c:gt:Asset management": 9,

"c:gt:Fruits and vegetables": 5,

"c:gt:Painting": 1,

"c:gt:A-Levels": 11,

"c:gt:High Fashion": 1,

"c:gt:Going out": 1,

"c:gt:Weather": 10,

"c:gt:Lawyer": 8,

"c:gt:Cooking": 7,

"c:gt:Good deals": 5,

"c:gt:Graphic design": 8,

"c:gt:Martial arts": 7,

"c:gt:Advertising": 7,

"c:gt:Dating": 16,

"c:gt:Outdoor activities": 1,

"c:gt:Charity": 8,

"c:gt:Gardening": 13,

"c:gt:Anatomy": 15,

"c:gt:Auto Elec - Electrical car": 15,

"c:gt:Labour law": 15,

"c:gt:Furniture": 3,

"c:gt:Kitchen Appliances": 11,

"c:gt:Films": 7,

"c:gt:Loans": 10,

"c:gt:Art": 1,

"c:gt:Arts \u0026 crafts": 13,

"c:gt:Literature": 16,

"c:gt:Tv channels": 9,

"c:gt:News": 8,

"c:gt:Real estate": 12,

"c:gt:Consumer Electronics (Brown goods)": 1,

"c:gt:Building and civil engineering": 14,

"c:gt:DIY Equipment": 6,

"c:gt:Cameras": 13,

"c:gt:Phone": 6,

"c:gt:Retirement period": 11,

"c:gt:Finance": 14,

"c:gt:Baseball": 13,

"c:gt:Higher education": 12,

"c:gt:Alcohol": 10,

"c:gt:Rail Transport": 11,

"c:gt:Horse riding": 1,

"c:gt:Sports": 11,

"c:gt:Farmer": 8,

"c:gt:Savoury food": 6,

"c:gt:Jewelry": 7,

"c:gt:Software": 2,

"c:gt:Astronomy": 13,

"c:gt:Nature": 9,

"c:gt:Beauty treatments": 1,

"c:gt:Fast food": 2,

"c:gt:Family": 16,

"c:gt:Eyewear": 7,

"c:gt:Ecology": 3,

"c:gt:Video games": 7,

"c:gt:Air conditioning": 1,

"c:gt:Cars": 10,

"c:gt:Motorcycles and bicycles": 1,

"c:gt:Bodybuilding": 2,

"c:gt:Weapons": 16,

"c:gt:Hunting \u0026 Fishing": 11,

"c:gt:Classical music \u0026 instruments": 1,

"c:gt:Fauna": 10,

"c:gt:Running": 8,

"c:gt:Marriage - civil union": 16,

"c:gt:TV Shows": 8,

"c:gt:Social networks": 10,

"c:gt:Politics": 17,

"c:gt:History": 17,

"c:gt:Back to school": 1,

"c:gt:Careers and occupational training": 14,

"c:gt:Teaching": 12,

"c:gt:Children": 9,

"c:gt:Toys and games": 11

}

}

}

The data format contains these attributes:

  1. userId: is a unique identifier of a Webo profile
  2. owner: Built profiles owner. E.g “wb”
  3. country: country of the user profile. e.g “GB”
  4. latitude: latitude geographic coordinate in demical degree.
  5. longitude: longitude geographic coordinate in demical degree.
  6. date: is in the ISO format YYYYMMDD
  7. qindexes: It contains the score affinity with each segment.
    1. static prefix: “c” for “cluster” which means a taxonomy segment.
    2. Taxonomy_name. For instance “gt” which corresponds to Weborama standard taxonomy.
    3. Segment_name. For instance “Running”.
    1. They key of each segment is constructed by:
    2. The value of each key corresponds to the score affinity of the User with the corresponding segment. The scores are quantiles between 1 and 20. A score of 1 means a low affinity with the segment and a score of 20 means a strong affinity with the segment.
  8. Format

We are using JSON compressed format. Here is an example of

...