SIGforum
What is the best automated way to save an entire website?

This topic can be found at:
https://sigforum.com/eve/forums/a/tpc/f/320601935/m/3610030044

April 03, 2018, 06:52 PM
deepocean
What is the best automated way to save an entire website?
What is the best automated way to locally save an entire website, including all the links, text, and images?

There is an Italian family site I use where I run a search that may yield 5-20 pages of results. Each result has an image box with text or a scanned image. Sometimes there are hundreds or thousands of results, and it takes a lot of time to click on each link, then right click to save the image.

There has to be a better way to do this.
April 03, 2018, 06:56 PM
senza nome
Years and years ago I used something called Superbot.
April 03, 2018, 07:11 PM
Bytes
Depending on the size of the website there are multiple ways of doing it. I don't think any of them will help you. The methods I am talking about will download all of the html, css, javascript, images, etc. That will require time, patience, and disk space and not really be searchable. What is your ultimate goal?
April 03, 2018, 07:13 PM
Skins2881
I'd like to save a thread from SIGforum, is that possible?



Jesse

Sic Semper Tyrannis
April 03, 2018, 07:18 PM
deepocean
quote:
Originally posted by Bytes:
Depending on the size of the website there are multiple ways of doing it. I don't think any of them will help you. The methods I am talking about will download all of the html, css, javascript, images, etc. That will require time, patience, and disk space and not really be searchable. What is your ultimate goal?


I would like to preserve the search results with the links and such in their original form. I looked again today, and it is all mostly text.
April 03, 2018, 07:29 PM
Bytes
This may be a bit techy and if the site is big it will be time consuming. Here's a few ways to do it. I'm also not sure how searchable it would be. Most of these tools will allow you to run the website locally, which is kind of cool.
April 03, 2018, 07:29 PM
btgoanna
Try this
https://www.httrack.com/

I think it does what you want
Careful with settings - tell it to do down levels is fine , but if you tellit to follow external links , you may download more than expected Wink



.
April 03, 2018, 07:52 PM
deepocean
quote:
Originally posted by btgoanna:
Try this
https://www.httrack.com/

I think it does what you want
Careful with settings - tell it to do down levels is fine , but if you tellit to follow external links , you may download more than expected Wink


I tried that in the past, but I could not get it to grab enough links. Maybe it had to do with my having to login to the site?

Maybe it's me, but things which were simple years ago seem to be getting more and more complicated.
April 03, 2018, 07:54 PM
deepocean
quote:
Originally posted by Bytes:
This may be a bit techy and if the site is big it will be time consuming. Here's a few ways to do it. I'm also not sure how searchable it would be. Most of these tools will allow you to run the website locally, which is kind of cool.


Thank you, I will try a few of those options and see which one works best. I am asking here to try and not compromise my computer in the course of doing this.
April 03, 2018, 09:44 PM
jbcummings
P.E.R.L. and a bit of mental elbow grease would work. Am I showing my age again?


———-
Do not meddle in the affairs of wizards, for thou art crunchy and taste good with catsup.
April 03, 2018, 11:42 PM
deepocean
I tried a number of the suggested solutions and nothing worked. One problem is although it is free to use, it requires a user name and password. I tried using one program that allowed for adding fields with the login and password, but this doesn't allow me to perform a search and then save the webpage results.

I thought perhaps a browser helper might work. I tried 3 or 4 options for that, to see if I could save the links one page at a time, but that didn't work.

I also tried using the feature of Windows 7 that allows you to save a screen shot every time you click to document a problem. Using screen shots is an option, but will not be as good as if I can figure out how to save the original website structure.

As for the website I am hoping to save, it works like this: once I login, I search by surname for the entire province, and a list of links for results are shown. When I click on a link, there are additional links to source documents. There are several layers of links.

If I can find something that will save all of the links on a page I am looking at, that would be great for my purposes. If no simple solution to this exists, it will be OK. I really appreciate everyone who has taken the time to try to help.
April 04, 2018, 06:53 AM
BillyBonesNY
In the past, I’ve used web hacker by blue squirrel.

You may try printing to file... pdf on new windows/ms office and then collate the pages into a single file.


----------------------------------------
http://lonesurvivorfoundation.org
April 04, 2018, 07:53 AM
steve495
quote:
Sometimes there are hundreds or thousands of results


Almost certainly this is a database-driven website and not a static website with hundreds or thousands of pages.

There are applications (SiteSucker is one I've used on a Mac) where you provide a primary "home page" link and then the application follows all of the links on that page to "suck up" the website. It generally does a sort of good job.

But those applications may or may not work well on database-driven sites. All depends.

As mentioned above, the site you describe almost certainly includes only a few template pages and populates the pages by pulling information from a database.

It would help if you provided the URL to the site.

I could be wrong and SiteSucker might work. I just did a test on one of the sites that I managed and it sort of did pretty good, but it's not at all like the site you describe.


Steve


Small Business Website Design & Maintenance - https://spidercreations.net | OpSpec Training - https://opspectraining.com | Grayguns - https://grayguns.com

Evil exists. You can not negotiate with, bribe or placate evil. You're not going to be able to have it sit down with Dr. Phil for an anger management session either.
April 04, 2018, 02:41 PM
deepocean
The site I am using has improved things since I last looked a few years ago. Now each record has links to all related and supporting documents.

Given the site is password protected and requires a login, and the links are generated in response to a search, it appears the simplest thing within my ability to do will be to save individual pages as pdf files. Maybe as time goes on I will find a simpler solution, but short of writing some custom code, I think this will work best. It will be a good thing to have the records separated as PDF files to document my research.

Thank you to all who made suggestions here. I appreciate your kindness in taking the time to help me.