Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 2 Current »

Push Input Corpus Data to an AWS S3 bucket:

E.g:

bigsea-europe-prod-corpora-seed

or

weborama-myclient-corpora-seed

Then the files should be pushed to a dedicated folder

E.g: weborama-myclient-corpora-seed/telegramme/

The format File:

Compressed JSON

input_corpus_telegramme_2021.gz

The format naming for a daily delivery: input_corpus_telegramme_20220106.gz

The format naming for an hourly delivery: input_corpus_telegramme_2022010608.gz

The file content:

{
"content_id": "12682665",
"title": "Le maire\u00a0de Belz a souhait\u00e9 ses voeux 2021 en vid\u00e9o\u00a0",
"content": "Covid oblige, impossible d\u2019organiser les v\u0153ux de la municipalit\u00e9 de Belz qui r\u00e9unissent des centaines de personnes aux Ast\u00e9ries. D\u00e8s lors, le maire, Bruno Goasmat, est pass\u00e9 au mode distanciel\u00a0avec des v\u0153ux en ligne.Apr\u00e8s un tour d\u2019horizon de la situation et des….”,
"description": "bla bla",
"keywords": ""Voeux", "Bruno Goasmat"",
}

The mandatory fields are: ID, Title, Content or Description

Output Document Profiles:

The delivery will be in dedicated aws s3 bucket:

bigsea-europe-prod-document-profiles-exported/telegramme/inputCorpus/

weborama-myclient-document-profiles-exported/inputCorpus/

The file path will be:

bigsea-europe-prod-document-profiles-exported/telegramme/inputCorpus/2021/11/23/14/

The file format will be:

compressed json.

The format name will be:

<yyyymmddhh>-doc-profiles-<owner>-<targetID>.json.gz

E.g 2021112314-doc-profiles-tlgic-36.json.gz

{"content_id":"12807040","segments":[{"id":"Classical music & instruments_c30258","score":4,"ttl":2592000},{"id":"Halloween_c30115","score":1,"ttl":2592000},{"id":"Films_c30302","score":1,"ttl":2592000},{"id":"Cinemas_c30257","score":1,"ttl":2592000},{"id":"Popular Events_c30178","score":1,"ttl":2592000},{"id":"Art_c30309","score":3,"ttl":2592000},{"id":"Music_c30163","score":2,"ttl":2592000},{"id":"Family_c30295","score":3,"ttl":2592000},{"id":"Going out_c30317","score":2,"ttl":2592000},{"id":"Theatre_c30217","score":2,"ttl":2592000}],"id_type":"MoonFishLabel","lang":"fr"}

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.