Push Input Corpus Data to an AWS S3 bucket:
E.g:
bigsea-europe-prod-corpora-seed
or
weborama-myclient-corpora-seed
Then the files should be pushed to a dedicated folder
E.g: weborama-myclient-corpora-seed/telegramme/
The format File:
Compressed JSON
input_corpus_telegramme_2021.gz
The format naming for a daily delivery: input_corpus_telegramme_20220106.gz
The format naming for an hourly delivery: input_corpus_telegramme_2022010608.gz
The file content:
{
"content_id": "12682665",
"title": "Le maire\u00a0de Belz a souhait\u00e9 ses voeux 2021 en vid\u00e9o\u00a0",
"content": "Covid oblige, impossible d\u2019organiser les v\u0153ux de la municipalit\u00e9 de Belz qui r\u00e9unissent des centaines de personnes aux Ast\u00e9ries. D\u00e8s lors, le maire, Bruno Goasmat, est pass\u00e9 au mode distanciel\u00a0avec des v\u0153ux en ligne.Apr\u00e8s un tour d\u2019horizon de la situation et des….”,
"description": "bla bla",
"keywords": ""Voeux", "Bruno Goasmat"",
}
The mandatory fields are: ID, Title, Content or Description
Output Document Profiles:
The delivery will be in dedicated aws s3 bucket:
bigsea-europe-prod-document-profiles-exported/telegramme/inputCorpus/
weborama-myclient-document-profiles-exported/inputCorpus/
The file path will be:
bigsea-europe-prod-document-profiles-exported/telegramme/inputCorpus/2021/11/23/14/
The file format will be:
compressed json.
The format name will be:
<yyyymmddhh>-doc-profiles-<owner>-<targetID>.json.gz
E.g 2021112314-doc-profiles-tlgic-36.json.gz
{"content_id":"12807040","segments":[{"id":"Classical music & instruments_c30258","score":4,"ttl":2592000},{"id":"Halloween_c30115","score":1,"ttl":2592000},{"id":"Films_c30302","score":1,"ttl":2592000},{"id":"Cinemas_c30257","score":1,"ttl":2592000},{"id":"Popular Events_c30178","score":1,"ttl":2592000},{"id":"Art_c30309","score":3,"ttl":2592000},{"id":"Music_c30163","score":2,"ttl":2592000},{"id":"Family_c30295","score":3,"ttl":2592000},{"id":"Going out_c30317","score":2,"ttl":2592000},{"id":"Theatre_c30217","score":2,"ttl":2592000}],"id_type":"MoonFishLabel","lang":"fr"}
Add Comment