Check website for broken links using HTTrack website copier.
Install HTTrack.
$ sudo apt-get install -y httrack
Check website for broken links using 8 concurrent connections, /tmp/spider-check
temporary directory, ignoring robots.txt
and logging to /tmp/spider-check.log
file.
$ httrack --spider --robots=0 --sockets=8 --path /tmp/spider-check --verbose https://sleeplessbeastie.eu/ | tee /tmp/spider-check.log
HTTrack3.49-2 launched on Sun, 04 Nov 2018 21:56:47 at https://sleeplessbeastie.eu/ (httrack -p0C0I0t -s0 -c8 -O /tmp/spider-check -v https://sleeplessbeastie.eu/ ) Information, Warnings and Errors reported for this mirror: note: the hts-log.txt file, and hts-cache folder, may contain sensitive information, such as username/password authentication for websites mirrored in this project do not share these files/folders if you want these information to remain private Mirror launched on Sun, 04 Nov 2018 21:56:47 by HTTrack Website Copier/3.49-2 [XR&CO'2014] mirroring https://sleeplessbeastie.eu/ with the wizard help.. 21:59:18 Warning: Retry after error -4 (Incorrect length (0 Bytes, 790 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.simple.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:18 Warning: Retry after error -4 (Incorrect length (0 Bytes, 772 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.advanced.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:18 Warning: Retry after error -4 (Incorrect length (0 Bytes, 3592696 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) 21:59:18 Warning: Retry after error -4 (Incorrect length (0 Bytes, 3589948 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) 21:59:19 Error: "Not Found" (404) at link https://sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/) 21:59:19 Error: "Not Found" (404) at link https://sleeplessbeastie.eu/privacy/ (from https://sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/) 21:59:19 Warning: Retry after error -4 (Incorrect length (0 Bytes, 790 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.simple.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:19 Warning: Retry after error -4 (Incorrect length (0 Bytes, 772 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.advanced.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:19 Warning: Retry after error -4 (Incorrect length (0 Bytes, 3592696 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) 21:59:19 Warning: Retry after error -4 (Incorrect length (0 Bytes, 3589948 expected)) at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) 21:59:20 Error: "Incorrect length (0 Bytes, 790 expected)" (-4) after 2 retries at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.simple.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:20 Error: "Incorrect length (0 Bytes, 772 expected)" (-4) after 2 retries at link https://sleeplessbeastie.eu/wp-content/uploads/2016/07/awstats.advanced.patch (from https://sleeplessbeastie.eu/2016/07/04/how-to-add-menu-to-awstats-web-interface/) 21:59:20 Error: "Incorrect length (0 Bytes, 3592696 expected)" (-4) after 2 retries at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXA-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) 21:59:20 Error: "Incorrect length (0 Bytes, 3589948 expected)" (-4) after 2 retries at link https://sleeplessbeastie.eu/wp-content/uploads/2012/07/WAG120N-EU-ANNEXB-ETSI-1.00.16code.bin.7z (from https://sleeplessbeastie.eu/2012/07/28/how-to-reboot-linksys-wag120n-router/) HTTrack Website Copier/3.49-2 mirror complete in 2 minutes 33 seconds : 598 links scanned, 584 files written (11857590 bytes overall) [3024769 bytes received at 19769 bytes/sec], 11875420 bytes transferred using HTTP compression in 586 files, ratio 23%, 49.8 requests per connection (6 errors, 8 warnings, 0 messages) Done. Thanks for using HTTrack!
Inspect /tmp/spider-check.log
log file for details.
$ grep "(...)" /tmp/spider-ckeck.log 21:59:19 Error: "Not Found" (404) at link https://sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/) 21:59:19 Error: "Not Found" (404) at link https://sleeplessbeastie.eu/privacy/ (from https://sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/)
Get status.
$ tail -3 /tmp/spider-ckeck.log | head -1 (6 errors, 8 warnings, 0 messages)
Filter by mime type to skip these non important errors and focus on links.
$ httrack --spider --robots=0 --sockets=8 --path /tmp/spider-check --verbose https://sleeplessbeastie.eu/ -mime:* +mime:text/*
HTTrack3.49-2 launched on Sun, 04 Nov 2018 22:15:10 at https://sleeplessbeastie.eu/ -mime:* +mime:text/* (httrack -p0C0I0t -s0 -c8 -O /tmp/spider-check -v https://sleeplessbeastie.eu/ -mime:* +mime:text/* ) Information, Warnings and Errors reported for this mirror: note: the hts-log.txt file, and hts-cache folder, may contain sensitive information, such as username/password authentication for websites mirrored in this project do not share these files/folders if you want these information to remain private Mirror launched on Sun, 04 Nov 2018 22:15:10 by HTTrack Website Copier/3.49-2 [XR&CO'2014] mirroring https://sleeplessbeastie.eu/ -mime:* +mime:text/* with the wizard help.. 22:17:39 https:/Error: l"Not Found" (404) at link https://sleeplessbeastie.eu/2018/03/28/how-to-install-docker-on-debian-stretch/ (from https://sleeplessbeastie.eu/2018/04/16/how-to-setup-private-docker-registry/) 22:17:39 https:/Error: l"Not Found" (404) at link https://sleeplessbeastie.eu/privacy/ (from https://sleeplessbeastie.eu/2013/06/16/wordpress-to-jekyll-migration/) HTTrack Website Copier/3.49-2 mirror complete in 2 minutes 29 seconds : 584 links scanned, 582 files written (11107549 bytes overall) [2931119 bytes received at 19671 bytes/sec], 11125379 bytes transferred using HTTP compression in 584 files, ratio 23%, 73.0 requests per connection (2 errors, 0 warnings, 0 messages) Done. Thanks for using HTTrack!
Remove log and temporary directory.
$ rm /tmp/spider-check.log
$ rm -rf /tmp/spider-check