Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • bucket name: s3://weborama-<publisher_name>.

  • path: /seed_urls/<yyyymm>/<dd>/<hh>: Par exemple: /seed_urls/202106/08/02/

  • Frequency: every hour.

  • File name: input_yyyymm-dd-hh.csv.gz

  • File format: compressed CSV

  • File content: 1 url per row.

Note

Protocol needs to be added in the source file, otherwise urls will be dropped

E.g:

https://ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059

https://www.passeportsante.net/fr/Actualites/Dossiers/DossierComplexe.aspx?doc=vegetaux-proteines-le-quinoa

...