- urlPattern: A URL pattern for which similar URLs will be found. Wildcards can (and should) be used as described on the common crawl documentation, so more than one URL is returned.
- yearFrom: The first year of archives to get. This is the year the URL was crawled, not necessarily when the page was created or updated. Years range from 2008 to the current year.
- yearTo: The last year of archives to get. This is the year the URL was crawled, not necessarily when the page was created or updated. Years range from 2008 to the current year.
- maxItemsPerMonth: The maximum number of URLs to return per month.
The following example will extract a list of Wikipedia URLs:
Code: Select all
CommonCrawl.Find
· "https://en.wikipedia.org/*"
· 2010
· 2019
· 1000
as url
extract
url
url