Re-index log file - Server info offering: Server software, environment, MySQL, PDF-converter, image functions, phpini.ini file phpini integration, phpini security info. Each item holding lists of details. All text links, media links and thumbnails are active linked. As stated in chapter Introduction , this search engine uses some phpini libraries and . . .
. . .
'Tips & Tricks & Mods' 3.1 All options It is possible to spider web pages from the command line, using the syntax: phpini spider.phpini <options> where <options> are: -all Reindex everything in the database. -eall Erase database and afterwards re-index all. -new Index all new URLs in database which had not jet been indexed. -erase Erase the . . .
. . .
multiple strings). For example, for spidering and indexing http://www.domain.com/test.html to depth 2, use: phpini spider.phpini -u http://www.domain.com/test.html -d 2 If you want to reindex the same URL, use: phpini spider.phpini -u http://www.domain.com/test.html -r 3.2 Multithreaded indexing For command line operation parallel indexing has no . . .
. . .
not jet been indexed <-new> Simply start several threads and add individual IDs to the option parameter like phpini spider.phpini -new1 phpini spider.phpini -new2 etc. The IDs will be added to the name of the corresponding log files like: db2_100524-21.47.56_ID1.html (log file of first thread) db2_100524-21.48.12_ID2.html (log file of second thread) . . .
. . .
log file will be unreadable. 3.2.2 Re-index all To be invoked by once preparing the database with the command phpini spider.phpini <-preall> This will reset all 'Last indexed' tables to '0000', but will not erase the content of all the other tables. So the check whether the content of a page has changed (MD5sum) is still available for a fast . . .
. . .
re-indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: phpini spider.phpini -erased1 phpini spider.phpini -erased2 etc. The IDs will be added to the names of the log files as described above 3.2.3 Index erased sites Index all meanwhile erased sites will index only those sites that had been . . .
. . .
indexing could be invoked by starting several threads and adding individual IDs to the option parameter like: phpini spider.phpini -erased1 phpini spider.phpini -erased2 etc. The IDs will be added to the names of the log files as described above Top 4. Keeping pages, words and files from being indexed 4.1 robots.txt The most common way to prevent pages . . .
. . .
tag to specify your preferred page version: <link rel="canonical" href="http://www.example.com/product.phpini?item=swedish-fish" /> inside the <head> section of all the duplicate content URLs: http://www.example.com/product.phpini?item=swedish-fish&category=gummy-candy . . .
. . .
Sphider-plus will understand that the duplicates all refer to the canonical URL: http://www.example.com/product.phpini?item=swedish-fish. The duplicate pages will be ignored and not indexed. Sphider-plus takes the rel="canonical" as a directive, not a hint. The canonical link may also be a relative path, but is not allowed to refer to a different . . .