...
bucket name: s3://weborama-<publisher_name>.
path: /seed_urls/<yyyymm>/<dd>/<hh>: Par exemple: /seed_urls/202106/08/02/
Frequency: every hour.
File name: input_yyyymm-dd-hh.csv.gz
File format: compressed CSV
File content: 1 url per row.
Note |
---|
Protocol needs to be added in the source file, otherwise urls will be dropped |
E.g:
https://ville-data.com/nombre-d-habitants/La-Chataigneraie-85-85059
...