IBM WebSphere Portal 8.5: User assistance for administrators |
---|
Manage the content sources of a search collection
To work with the content sources of a search collection, click the collection name in the list of search collections. Manage Search lists the Content Sources and the Search collection status information of the selected search collection. A search collection can be configured to cover more than one content source. From the Content Sources window, you can refresh, view status, and configure schedulers and crawlers for a specific content source.
While a crawl on the content source is running, refreshing the status information updates the information about the run time and the documents collected so far. You can also view the status of the content source such as the number of documents, the run time of the last crawler, and the schedules of the run.
You can view the status of the schedulers and configure crawlers for a specific content source. The View Content Source Schedulers icon is displayed only if you defined scheduled crawls for this content source. If you click this icon, the portlet lists the scheduled crawls. It also provides information about the start date and time, repeat interval, next run date and time, and status. The status information can be disabled or enabled.
- For website content sources, documents that were indexed before and still exist in the content source are updated. Documents that were indexed before, but no longer exist are retained in the search collection. Documents that are new in the content source are indexed and added to the collection.
- For WebSphere® Portal sites, the crawl adds all pages and portlets to the content source. It deletes portlets and static pages from the content source that were removed from the portal. The crawl works similarly to the option Regather documents from Content Source.
- For IBM® Web Content Manager sites, Portal Search uses an incremental crawling method. In addition to added and updated content, the seedlist explicitly specifies deleted content. In contrast, clicking Regather documents from Content Source starts a full crawl; it does not continue from the last session, and it is therefore not incremental.
- For content sources created with the seedlist provider option, a crawl on a remote system that supports incremental crawling, such as IBM Connections, behaves like a crawl on a Web Content Manager site.
- If you modify a content source that belongs to a search scope, update the scope manually to make sure that it still covers that content source. Especially if you changed the name of the content source, edit the scope and make sure that it is still listed there if not add it again.
- If you delete a content source, the documents that were collected remain available for search by users in the All Scopes option. Since the documents were included in the content source before it was deleted. These documents are available until the expiration time. In the General Parameters tab, you can specify the expiration time from the Links expire after(days) menu.
Content Source is OK
. If the content source is invalid, inaccessible, or
blocked, Manage Search returns an error message. When you create a new content source, Manage Search
starts the Verify Address feature.