Generic Taxonomy
Weborama generic taxonomy has around 200 behavioral segments covering most of interests. Here is a sample list of these segments:
- Holidays
- Auto parts
- Car Rental
- Literature
- News
- Running
- Philosophy
- ….
Custom Taxonomies
Weborama enables to create custom taxonomies for its clients using the taxonomy engine BigFish. Here are some examples of custom taxonomies that you can create via BigFish:
- Vertical Taxonomies: E.g. Automotive taxonomy
- Brand Taxonomies: L’Oréal, P&G
- Competition Taxonomies: UBS vs HSBC, Goldman Sachs
- …
- Data Extraction frequency:
Profiles built upon standard taxonomy or a custom taxonomy are extracted every day to a dedicated S3 per region.
- Data Format:
The profiles are extracted as JSON compressed files.
- Output Data Hierarchy:
For daily extracted profiles. We are using the following data hierarchy:
1) S3 bucket per region:
European profiles should be stored and exported to a european region, US profiles should be stored and exported to US region.
2 ) different types of profiles:
- generic: profiles are built with Weborama standard taxonomy.
- extra: profiles are built with a custom taxonomy.
3) 1 folder per country:
- Data Hierarchy: s3://<bucket-name>/<owner>/<country>/
4 ) 1 folder per taxonomy:
- Data Hierarchy: s3://<bucket-name>/<owner>/<country>/<taxonomy_name>/
- E.g: s3://webo-profiles-eu/wb/DE/gt/
s3://webo-profiles-us/wb/US/gt/
5 ) Partition Data per Year, month, day:
- Data Hierarchy: s3://<bucket-name>/<owner>/<country>/<taxonomy_name>/<year>/<month>/<day>/
- E.g: s3://webo-profiles-eu/wb/DE/gt/2017/11/22/
6 ) Profiles for the selected taxonomy:
- Data Hierarchy: s3://<bucket-name>/<owner>/<country>/<taxonomy_name>/<year>/<month>/<day>/part-*.gz
- E.g:
e.g: s3://webo-profiles-eu/wb/DE/gt/2017/11/22/part*.gz
- A User Profile data:
Here is the data output format of a sample user profile:
{
"userId": "0000084c-b403-3b4a-b2ed-71d8de0465e4",
"owner": "wb",
"country": "DE",
"latitude": 50.100006103515625,
"longitude": 8.600006103515625,
"date": "20171122",
"qindexes": {
"mapping": {
"c:gt:Major Appliances (White goods)": 10,
"c:gt:Halloween": 5,
"c:gt:Comics": 8,
"c:gt:Comedy": 1,
"c:gt:Law": 17,
"c:gt:Cloud computing": 1,
"c:gt:Hiking \u0026 Mountaineering": 16,
"c:gt:Air Transport": 17,
"c:gt:Tourism": 10,
"c:gt:X": 2,
"c:gt:Healthcare and medicine": 4,
"c:gt:Occult": 17,
"c:gt:Astrology": 12,
"c:gt:Fine dining \u0026 local produce": 11,
"c:gt:Sunny Destination": 9,
"c:gt:Interior design": 4,
"c:gt:Winter Holidays": 6,
"c:gt:Psychotherapy": 16,
"c:gt:Footwear": 1,
"c:gt:Top Business schools": 9,
"c:gt:American Football": 8,
"c:gt:Bicycle": 7,
"c:gt:Catch-up TV": 10,
"c:gt:Computer hardware \u0026 devices": 12,
"c:gt:Fantasy": 6,
"c:gt:Banking": 14,
"c:gt:Accessories": 1,
"c:gt:Gambling": 10,
"c:gt:Doctor": 9,
"c:gt:Pharmacy": 6,
"c:gt:Car brands": 1,
"c:gt:ISP \u0026 Browsers": 13,
"c:gt:Business Administration \u0026 Management": 14,
"c:gt:Insurance": 1,
"c:gt:Energy": 15,
"c:gt:Auto parts": 1,
"c:gt:Fashion trend": 7,
"c:gt:Holiday rentals": 11,
"c:gt:Architect": 1,
"c:gt:Merchant": 11,
"c:gt:Entrepreneur": 16,
"c:gt:Holidays": 16,
"c:gt:Diet and nutrition": 1,
"c:gt:DIY": 11,
"c:gt:Public administrations": 11,
"c:gt:Clothing": 5,
"c:gt:Asset management": 9,
"c:gt:Fruits and vegetables": 5,
"c:gt:Painting": 1,
"c:gt:A-Levels": 11,
"c:gt:High Fashion": 1,
"c:gt:Going out": 1,
"c:gt:Weather": 10,
"c:gt:Lawyer": 8,
"c:gt:Cooking": 7,
"c:gt:Good deals": 5,
"c:gt:Graphic design": 8,
"c:gt:Martial arts": 7,
"c:gt:Advertising": 7,
"c:gt:Dating": 16,
"c:gt:Outdoor activities": 1,
"c:gt:Charity": 8,
"c:gt:Gardening": 13,
"c:gt:Anatomy": 15,
"c:gt:Auto Elec - Electrical car": 15,
"c:gt:Labour law": 15,
"c:gt:Furniture": 3,
"c:gt:Kitchen Appliances": 11,
"c:gt:Films": 7,
"c:gt:Loans": 10,
"c:gt:Art": 1,
"c:gt:Arts \u0026 crafts": 13,
"c:gt:Literature": 16,
"c:gt:Tv channels": 9,
"c:gt:News": 8,
"c:gt:Real estate": 12,
"c:gt:Consumer Electronics (Brown goods)": 1,
"c:gt:Building and civil engineering": 14,
"c:gt:DIY Equipment": 6,
"c:gt:Cameras": 13,
"c:gt:Phone": 6,
"c:gt:Retirement period": 11,
"c:gt:Finance": 14,
"c:gt:Baseball": 13,
"c:gt:Higher education": 12,
"c:gt:Alcohol": 10,
"c:gt:Rail Transport": 11,
"c:gt:Horse riding": 1,
"c:gt:Sports": 11,
"c:gt:Farmer": 8,
"c:gt:Savoury food": 6,
"c:gt:Jewelry": 7,
"c:gt:Software": 2,
"c:gt:Astronomy": 13,
"c:gt:Nature": 9,
"c:gt:Beauty treatments": 1,
"c:gt:Fast food": 2,
"c:gt:Family": 16,
"c:gt:Eyewear": 7,
"c:gt:Ecology": 3,
"c:gt:Video games": 7,
"c:gt:Air conditioning": 1,
"c:gt:Cars": 10,
"c:gt:Motorcycles and bicycles": 1,
"c:gt:Bodybuilding": 2,
"c:gt:Weapons": 16,
"c:gt:Hunting \u0026 Fishing": 11,
"c:gt:Classical music \u0026 instruments": 1,
"c:gt:Fauna": 10,
"c:gt:Running": 8,
"c:gt:Marriage - civil union": 16,
"c:gt:TV Shows": 8,
"c:gt:Social networks": 10,
"c:gt:Politics": 17,
"c:gt:History": 17,
"c:gt:Back to school": 1,
"c:gt:Careers and occupational training": 14,
"c:gt:Teaching": 12,
"c:gt:Children": 9,
"c:gt:Toys and games": 11
}
}
}
The data format contains these attributes:
- userId: is a unique identifier of a Webo profile
- owner: Built profiles owner. E.g “wb”
- country: country of the user profile. e.g “GB”
- latitude: latitude geographic coordinate in demical degree.
- longitude: longitude geographic coordinate in demical degree.
- date: is in the ISO format YYYYMMDD
- qindexes: It contains the score affinity with each segment.
- static prefix: “c” for “cluster” which means a taxonomy segment.
- Taxonomy_name. For instance “gt” which corresponds to Weborama standard taxonomy.
- Segment_name. For instance “Running”.
- They key of each segment is constructed by:
- The value of each key corresponds to the score affinity of the User with the corresponding segment. The scores are quantiles between 1 and 20. A score of 1 means a low affinity with the segment and a score of 20 means a strong affinity with the segment.
- Format
We are using JSON compressed format. Here is an example of
- Frequency