How to download entire contents of a website / blog?

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
Post Reply
Topic Author
Caduceus
Posts: 3527
Joined: Mon Sep 17, 2012 1:47 am

How to download entire contents of a website / blog?

Post by Caduceus »

As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
ceperry
Posts: 17
Joined: Tue Jul 12, 2016 8:45 am

Re: How to download entire contents of a website / blog?

Post by ceperry »

Look at the `wget` command. It is available for different platforms. https://www.gnu.org/software/wget/
drk
Posts: 3943
Joined: Mon Jul 24, 2017 10:33 pm

Re: How to download entire contents of a website / blog?

Post by drk »

You should contact the owner of the site to get permission to do this.

If you're comfortable (learning about) programming, the Python library Beautiful Soup works well for web-scraping like this, and there are a lot of beginner-level tutorials on the web.
A useful razor: anyone asking about speculative strategies on Bogleheads.org has no business using them.
Topic Author
Caduceus
Posts: 3527
Joined: Mon Sep 17, 2012 1:47 am

Re: How to download entire contents of a website / blog?

Post by Caduceus »

I am not tech-savvy at all, so programming is out of the question. Is there any downloadable program (happy to pay for it) that will do this for me in a relatively straightforward way?

The content is all in the public domain, and I'm not re-publishing it. It's purely for reading and research.

I forgot to mention that apart from a blog, I'd like to save the entire contents of a Facebook group too but that seems a bit more complicated? Basically people have been posting old photographs, and then various other posters will provide information about it (details about dates, memories, locations, etc.) It's a treasure trove of historical information that's been crowd-sourced over a decade or so. It'd be a pity if one day all of this just gets lost.
fisher0815
Posts: 155
Joined: Tue Jun 08, 2021 3:10 am

Re: How to download entire contents of a website / blog?

Post by fisher0815 »

Caduceus wrote: Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
https://www.httrack.com/
pshonore
Posts: 8212
Joined: Sun Jun 28, 2009 2:21 pm

Re: How to download entire contents of a website / blog?

Post by pshonore »

Google "webwhacker software" . There are quite a few programs available.
User avatar
JupiterJones
Posts: 3623
Joined: Tue Aug 24, 2010 3:25 pm
Location: Nashville, TN

Re: How to download entire contents of a website / blog?

Post by JupiterJones »

Caduceus wrote: Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Plus a screenshot isn't really saving text. It's saving a picture of the text as it appears to you at that moment. You can't search the text for a keyword or resize the window and have the text reflow, for example, as you would an actual text file.

Saving the page as a "web archive" or "webpage, complete" or whatever your browser of choice calls it (look under your File menu), would actually preserve a page you're viewing in a way such that you could load it back into your web browser at any future point, and it would look/act just like a webpage. But of course, you'd have to do that with every blog post page as you viewed them--this won't crawl the entire site and download every thing you could possibly read on it.

This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?
"Stay on target! Stay on target!"
moshe
Posts: 565
Joined: Thu Dec 12, 2013 12:18 pm
Location: Boston, MA

Re: How to download entire contents of a website / blog?

Post by moshe »

fisher0815 wrote: Tue May 24, 2022 1:55 pm
Caduceus wrote: Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
https://www.httrack.com/
+1

I use this program all the time to download translations that i do in order to keep a personal archive in case the site ever goes dark.

Easy to use.

~Moshe
My money has no emotions. ~Moshe | | I'm the world's greatest expert on my own opinion. ~Bruce Williams
User avatar
nisiprius
Advisory Board
Posts: 52211
Joined: Thu Jul 26, 2007 9:33 am
Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry

Re: How to download entire contents of a website / blog?

Post by nisiprius »

On MacOS, I use an inexpensive app named SiteSucker.

It works reasonably well on straightforward websites, where the entire content of the whole website is not too gigantic (not more than a few hundred megabytes, say). I use it primarily to capture my own websites, which are either homemade HTML-and-Javascript-based or Wordpress-based.
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
LookinAround
Posts: 902
Joined: Tue Mar 27, 2018 5:41 am

Re: How to download entire contents of a website / blog?

Post by LookinAround »

Adobe will convert a website to pdf and it should also retain hyperlink functionality. In addition to offering a monthly subscription pricing model they also offer a free trial to check out

https://www.adobe.com/acrobat/how-to/co ... o-pdf.html
User avatar
ResearchMed
Posts: 16795
Joined: Fri Dec 26, 2008 10:25 pm

Re: How to download entire contents of a website / blog?

Post by ResearchMed »

fisher0815 wrote: Tue May 24, 2022 1:55 pm
Caduceus wrote: Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
https://www.httrack.com/
I'd love to save our own website, albeit mostly for nostalgic reasons.

Is this safe? That is, will it leave the current website entirely unchanged?

What is the form of what is saved, and how is the original displayed?
It captures all subsections, such as "Home", "Rates", "Photos", etc.?

And how does it differ from nisiprius' suggestion, SiteSucker (which seems not to be free)?
[How does one determine the size of a website before trying to copy it, if there is a maximum?]

Finally, IF one ever wanted to and the original site was closed, can the code be used easily to re-create the website so that "it's as if it never happened" (apologies to the disaster cleanup company!). This is unlikely in our case, but I like to try to learn the "just in case" possibilities when something could be anticipated, even if it's very unlikely.

Many thanks!

RM
This signature is a placebo. You are in the control group.
tunafish
Posts: 974
Joined: Mon Apr 26, 2021 9:47 am

Re: How to download entire contents of a website / blog?

Post by tunafish »

JupiterJones wrote: Tue May 24, 2022 2:18 pm
Caduceus wrote: Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?

I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Plus a screenshot isn't really saving text. It's saving a picture of the text as it appears to you at that moment. You can't search the text for a keyword or resize the window and have the text reflow, for example, as you would an actual text file.

Saving the page as a "web archive" or "webpage, complete" or whatever your browser of choice calls it (look under your File menu), would actually preserve a page you're viewing in a way such that you could load it back into your web browser at any future point, and it would look/act just like a webpage. But of course, you'd have to do that with every blog post page as you viewed them--this won't crawl the entire site and download every thing you could possibly read on it.

This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?
Never depend on stuff on the web staying available. In a stroke of luck, years ago I saved an entire website that someone had made of a very extensive family tree that included a lot of my family. At the time but apparently no longer, whatever browser I was using, probably firefox, had a command in a pull down menu to do this. That website has disappeared from the web.

I used to participate in a yahoo group for a medical condition. It had a wealth of information. Yahoo wiped its groups off the web later.
roamingzebra
Posts: 1214
Joined: Thu Apr 22, 2021 3:29 pm

Re: How to download entire contents of a website / blog?

Post by roamingzebra »

Web servers don't like to get hammered and can impose limitations or deny access in the face of too many perceived "bot attacks". For commands like Curl and Wget, I believe you can slow down the download process to avoid looking like a bot. It appears from the Httrack documentation that there are similar settings that can be used. Use them! Otherwise download restrictions can be placed on these websites down the line.

From: https://www.httrack.com/html/abuse.html
Downloading a site can overload it, if you have a fast pipe, or if you capture too many simultaneous cgi (dynamically generated pages).

Do not download too large websites: use filters
Do not use too many simultaneous connections
Use bandwidth limits
Use connection limits
Use size limits
Use time limits
Only disable robots.txt rules with great care
Try not to download during working hours
Check your mirror transfer rate/size
For large mirrors, first ask the webmaster of the site
Target2019
Posts: 901
Joined: Sat Mar 03, 2007 4:30 pm

Re: How to download entire contents of a website / blog?

Post by Target2019 »

I install Adobe Acrobat 9 from an old copy. On Windows 10 I can capture a web site down to as many levels as necessary.

There is also this solution, just tried: https://cloudconvert.com/save-website-pdf
I haven't looked at the options.
User avatar
Raybo
Posts: 2244
Joined: Tue Feb 20, 2007 10:02 am
Location: San Francisco
Contact:

Re: How to download entire contents of a website / blog?

Post by Raybo »

As an owner of a website, if I saw someone doing this (I have metering software on my site), I’d block their IP address from further accessing my site. I check for this hourly and block IP addresses several times a week, mostly from hacker bots trying to download my database.

But, if someone asked me questions, I’d answer then. If someone wanted copies of entries, I’d probably provide them (depends on why and proper attribution). If someone wanted the entire contents of my website, I’m not sure how I’d respond, but it would much nicer to someone who asked first.

By the way, is this a free blog or do you pay for access? Copying an entire website without asking first is, in my opinion, sneaky, unethical, and a violation of common decency. Are there ads that support the site? If so, downloading the whole site might even take money out of the owner’s pocket.
No matter how long the hill, if you keep pedaling you'll eventually get up to the top.
User avatar
LadyGeek
Site Admin
Posts: 95686
Joined: Sat Dec 20, 2008 4:34 pm
Location: Philadelphia
Contact:

Re: How to download entire contents of a website / blog?

Post by LadyGeek »

For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.

As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.

Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.

Caduceus - Please provide website links that you have downloaded content from.

(This thread was temporarily removed for moderator review.)
Wiki To some, the glass is half full. To others, the glass is half empty. To an engineer, it's twice the size it needs to be.
calwatch
Posts: 1447
Joined: Wed Oct 02, 2013 1:48 am

Re: How to download entire contents of a website / blog?

Post by calwatch »

You can use wget fairly easily for blogs - it will follow links. Another way to save your research would be to use something like the Chrome Single File extension and use Auto Save - Auto Save All Tabs. This is great if you need the backup for whatever you are researching, it automatically saves everything you view so it may eat up gigabytes of space a day.
mkc
Moderator
Posts: 3291
Joined: Wed Apr 17, 2013 2:59 pm

Re: How to download entire contents of a website / blog?

Post by mkc »

Raybo wrote: Tue May 24, 2022 6:10 pm As an owner of a website, if I saw someone doing this (I have metering software on my site), I’d block their IP address from further accessing my site. I check for this hourly and block IP addresses several times a week, mostly from hacker bots trying to download my database.

But, if someone asked me questions, I’d answer then. If someone wanted copies of entries, I’d probably provide them (depends on why and proper attribution). If someone wanted the entire contents of my website, I’m not sure how I’d respond, but it would much nicer to someone who asked first.

By the way, is this a free blog or do you pay for access? Copying an entire website without asking first is, in my opinion, sneaky, unethical, and a violation of common decency. Are there ads that support the site? If so, downloading the whole site might even take money out of the owner’s pocket.
As someone who has had a non-public discussion forum content scraped and the content sold for the benefit of the scraper (who was a moderator of that forum at the time), I agree with the above and strongly recommend contacting the website/blog owner first.
User avatar
Bogle7
Posts: 1984
Joined: Fri May 11, 2018 9:33 am
Location: In the Witness Protection Program

Re: How to download entire contents of a website / blog?

Post by Bogle7 »

I use Site Sucker
Old fart who does three index stock funds, baby.
GreendaleCC
Posts: 1093
Joined: Sun Dec 22, 2019 2:24 am

Re: How to download entire contents of a website / blog?

Post by GreendaleCC »

Caduceus wrote: Tue May 24, 2022 1:24 pm The content is all in the public domain, and I'm not re-publishing it. It's purely for reading and research.
Are you sure it’s “in the public domain,” or just publicly accessible?
User avatar
ResearchMed
Posts: 16795
Joined: Fri Dec 26, 2008 10:25 pm

Re: How to download entire contents of a website / blog?

Post by ResearchMed »

LadyGeek wrote: Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.

As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.

Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.

Caduceus - Please provide website links that you have downloaded content from.

(This thread was temporarily removed for moderator review.)
We would very much like to know how to do this for the entire website for *our* website!
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)

I hope this thread stays open.

RM
This signature is a placebo. You are in the control group.
adamthesmythe
Posts: 5774
Joined: Mon Sep 22, 2014 4:47 pm

Re: How to download entire contents of a website / blog?

Post by adamthesmythe »

LookinAround wrote: Tue May 24, 2022 3:08 pm Adobe will convert a website to pdf and it should also retain hyperlink functionality. In addition to offering a monthly subscription pricing model they also offer a free trial to check out

https://www.adobe.com/acrobat/how-to/co ... o-pdf.html
I've done this once or twice with Acrobat. You can select how deep in the links it will go.

Be careful what you wish for, there might be a LOT of data.
Marseille07
Posts: 16054
Joined: Fri Nov 06, 2020 12:41 pm

Re: How to download entire contents of a website / blog?

Post by Marseille07 »

I've used this thing with mixed success: https://www.httrack.com/
User avatar
ResearchMed
Posts: 16795
Joined: Fri Dec 26, 2008 10:25 pm

Re: How to download entire contents of a website / blog?

Post by ResearchMed »

Marseille07 wrote: Tue May 24, 2022 10:31 pm I've used this thing with mixed success: https://www.httrack.com/
What worked, and what didn't?

We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia... :wink:

Or would that just copy each long page?

RM
This signature is a placebo. You are in the control group.
Marseille07
Posts: 16054
Joined: Fri Nov 06, 2020 12:41 pm

Re: How to download entire contents of a website / blog?

Post by Marseille07 »

ResearchMed wrote: Tue May 24, 2022 10:36 pm What worked, and what didn't?

We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia... :wink:

Or would that just copy each long page?

RM
? Just try it.

The issue was it was slow to concurrently fetch pages at that time, because they capped the # of parallel connections (if you concurrently fetch hundreds of resources, the webserver can be under heavy load).

Otherwise it worked reasonably well.
KyleAAA
Posts: 9498
Joined: Wed Jul 01, 2009 5:35 pm
Contact:

Re: How to download entire contents of a website / blog?

Post by KyleAAA »

IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
pshonore
Posts: 8212
Joined: Sun Jun 28, 2009 2:21 pm

Re: How to download entire contents of a website / blog?

Post by pshonore »

ResearchMed wrote: Tue May 24, 2022 10:36 pm
Marseille07 wrote: Tue May 24, 2022 10:31 pm I've used this thing with mixed success: https://www.httrack.com/
What worked, and what didn't?

We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia... :wink:

Or would that just copy each long page?

RM
I believe most "web whackers" just create a local copy of the "content pages" for you wherever you choose to put it and tell the webwhacker program. To access it, just point your browser to wherever it is stored. For example c:/mystoredwebpages/firstwebsite etc. instead of the normal URL for the website. Just for laughs, you can press CTRL U while viewing any webpage to see what the code looks like (usually an incredible amount of data that scrolls on and on). In the very early days of the web, all that code was created by a developer typing in HTML commands. Today it is generated by web development software.
User avatar
quantAndHold
Posts: 10141
Joined: Thu Sep 17, 2015 10:39 pm
Location: West Coast

Re: How to download entire contents of a website / blog?

Post by quantAndHold »

Is there specific verbiage on the website saying that it’s in the public domain? Because otherwise, it isn’t. I’m going to go out on a limb and guess that this is a website that has historical documents or photos or something. Those might be in the public domain, but the website as a whole (the layout, or anything that the author wrote) is copyrighted.

My recommendation would be to contact the author of the site. If you have a legitimate use for the material, they would probably be willing to share.
User avatar
LadyGeek
Site Admin
Posts: 95686
Joined: Sat Dec 20, 2008 4:34 pm
Location: Philadelphia
Contact:

Re: How to download entire contents of a website / blog?

Post by LadyGeek »

ResearchMed wrote: Tue May 24, 2022 9:44 pm
LadyGeek wrote: Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.

As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.

Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.

Caduceus - Please provide website links that you have downloaded content from.

(This thread was temporarily removed for moderator review.)
We would very much like to know how to do this for the entire website for *our* website!
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)

I hope this thread stays open.

RM
The original site owners are very much around - which includes "our" website creator. Don't forget that this entire website (along with the entire internet) is archived at Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine. Enter a link in the Wayback machine to find the version which existed on a particular date.
Wiki To some, the glass is half full. To others, the glass is half empty. To an engineer, it's twice the size it needs to be.
User avatar
ResearchMed
Posts: 16795
Joined: Fri Dec 26, 2008 10:25 pm

Re: How to download entire contents of a website / blog?

Post by ResearchMed »

LadyGeek wrote: Wed May 25, 2022 6:02 pm
ResearchMed wrote: Tue May 24, 2022 9:44 pm
LadyGeek wrote: Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.

As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.

Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.

Caduceus - Please provide website links that you have downloaded content from.

(This thread was temporarily removed for moderator review.)
We would very much like to know how to do this for the entire website for *our* website!
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)


I hope this thread stays open.

RM
The original site owners are very much around - which includes "our" website creator. Don't forget that this entire website (along with the entire internet) is archived at Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine. Enter a link in the Wayback machine to find the version which existed on a particular date.
[bolded emphasis added]


I'm not sure if there was some confusion, but my use of "OUR website" did indeed refer to the website that DH and I own for our vacation rental business. It was an active business until not too long ago.
(It has nothing to do with the BH website.)


The "owners" are still very much around: That would be "the two of us".
However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).

And at some point, we'll take *our* :wink: website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).

RM
This signature is a placebo. You are in the control group.
User avatar
LadyGeek
Site Admin
Posts: 95686
Joined: Sat Dec 20, 2008 4:34 pm
Location: Philadelphia
Contact:

Re: How to download entire contents of a website / blog?

Post by LadyGeek »

^^^ No problem. Thanks for the clarification. :)
Wiki To some, the glass is half full. To others, the glass is half empty. To an engineer, it's twice the size it needs to be.
AnEngineer
Posts: 2414
Joined: Sat Jun 27, 2020 4:05 pm

Re: How to download entire contents of a website / blog?

Post by AnEngineer »

KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
That ignores any fair use considerations.
User avatar
JupiterJones
Posts: 3623
Joined: Tue Aug 24, 2010 3:25 pm
Location: Nashville, TN

Re: How to download entire contents of a website / blog?

Post by JupiterJones »

KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
Really? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.

Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.

Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
"Stay on target! Stay on target!"
User avatar
Raybo
Posts: 2244
Joined: Tue Feb 20, 2007 10:02 am
Location: San Francisco
Contact:

Re: How to download entire contents of a website / blog?

Post by Raybo »

JupiterJones wrote: Wed May 25, 2022 7:17 pm
KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
Really? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.

Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.

Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
How is this different than checking a book out of a library and then photocopying every page “for home use only?” The book isn’t in the public domain just because it is available for free from the library.

Most websites have an explicit copyright notice on each page.

I still contend that doing this without asking is a sneaky thing to do and shouldn’t be done.
No matter how long the hill, if you keep pedaling you'll eventually get up to the top.
Marseille07
Posts: 16054
Joined: Fri Nov 06, 2020 12:41 pm

Re: How to download entire contents of a website / blog?

Post by Marseille07 »

ResearchMed wrote: Wed May 25, 2022 6:36 pm I'm not sure if there was some confusion, but my use of "OUR website" did indeed refer to the website that DH and I own for our vacation rental business. It was an active business until not too long ago.
(It has nothing to do with the BH website.)


The "owners" are still very much around: That would be "the two of us".
However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).

And at some point, we'll take *our* :wink: website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).

RM
My confusion is why you haven't tried one of the suggestions to archive your site.
User avatar
22twain
Posts: 4030
Joined: Thu May 10, 2012 5:42 pm

Re: How to download entire contents of a website / blog?

Post by 22twain »

ResearchMed wrote: Wed May 25, 2022 6:36 pm However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).

And at some point, we'll take *our* :wink: website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).
Do you have the login credentials (username & password) for your site at the web-hosting provider? They should have a way to make a backup copy of all the files on your site and download it.

My personal hobby site is hosted by Namecheap. They use the "cPanel" interface for site owners (or their "web guru") to maintain their sites. I understand it's a very common interface. It can do backups and download them. If your provider uses that or something similar, you should be able to find someone who can use it with your authorization. You might even be able to do it yourself with help from your provider's tech support.

(I've never tried to use the backup feature myself. My site is all hand-coded HTML/CSS/PHP and image files, with no database. I keep a complete "mirror" of it on my local computer, with files and folders laid out exactly as on the web server. When I make changes to the site, I edit the "mirror" files locally, then upload them to the server using the "file manager" in the "cPanel" interface. So the "mirror" acts as my backup.)
Meet my pet, Peeve, who loves to convert non-acronyms into acronyms: FED, ROTH, CASH, IVY, ...
AnEngineer
Posts: 2414
Joined: Sat Jun 27, 2020 4:05 pm

Re: How to download entire contents of a website / blog?

Post by AnEngineer »

Raybo wrote: Wed May 25, 2022 11:07 pm
JupiterJones wrote: Wed May 25, 2022 7:17 pm
KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
Really? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.

Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.

Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
How is this different than checking a book out of a library and then photocopying every page “for home use only?” The book isn’t in the public domain just because it is available for free from the library.
The difference is that in order to view a website you have to make a copy. (This type of thing has actually been litigated in the past, see https://en.wikipedia.org/wiki/MAI_Syste ... uter,_Inc. where a computer repair company was successfully sued for turning on a clients computer because loading programs into RAM was considered copyright infringement. The law in that particular case has since changed.)

This is more akin to borrowing an e-book from the library and not deleting it after the lending period is over.
User avatar
JupiterJones
Posts: 3623
Joined: Tue Aug 24, 2010 3:25 pm
Location: Nashville, TN

Re: How to download entire contents of a website / blog?

Post by JupiterJones »

AnEngineer wrote: Thu May 26, 2022 7:39 am
Raybo wrote: Wed May 25, 2022 11:07 pm How is this different than checking a book out of a library and then photocopying every page “for home use only?” The book isn’t in the public domain just because it is available for free from the library.
The difference is that in order to view a website you have to make a copy. [...]

This is more akin to borrowing an e-book from the library and not deleting it after the lending period is over.
Right. When you borrow a book, you take physical possession of it and prevent anyone else from also borrowing and reading it until you give it back. The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTTP protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.

In fact, I would say that it's not quite like not deleting a borrowed e-book after the lending period, because there is no lending period for a web page.

But again, IANAL, so I'm only speaking to the rationality/sensibility of the moral argument for saving the copy of a web page (that you must necessarily have in order to view it) for later reference, not the legality.
Last edited by JupiterJones on Thu May 26, 2022 4:11 pm, edited 1 time in total.
"Stay on target! Stay on target!"
KyleAAA
Posts: 9498
Joined: Wed Jul 01, 2009 5:35 pm
Contact:

Re: How to download entire contents of a website / blog?

Post by KyleAAA »

AnEngineer wrote: Wed May 25, 2022 7:15 pm
KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
That ignores any fair use considerations.
It doesn't. It's unlikely fair use would allow using 100% of a copyrighted work, especially not for purposes of writing a book (a commercial endeavor). Also, different countries have different laws.
AnEngineer
Posts: 2414
Joined: Sat Jun 27, 2020 4:05 pm

Re: How to download entire contents of a website / blog?

Post by AnEngineer »

KyleAAA wrote: Thu May 26, 2022 12:33 pm
AnEngineer wrote: Wed May 25, 2022 7:15 pm
KyleAAA wrote: Tue May 24, 2022 10:58 pm IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
That ignores any fair use considerations.
It doesn't. It's unlikely fair use allow using 100% of a copyrighted work.
I wasn't claiming that fair use definitely allows the copying here, merely that it needs to be considered.

But fair use does allow using 100% of a copyrighted work in some cases, here is one: https://www.copyright.gov/fair-use/summ ... ir2014.pdf.
tlk59
Posts: 77
Joined: Mon Nov 27, 2017 12:25 pm

Re: How to download entire contents of a website / blog?

Post by tlk59 »

I haven't been an Evernote user in quite a while, but they do have a Web Clipper feature. (I haven't used it myself; I just prefer to link/bookmark to the live site usually.)

For the side point about how to preserve info from your own site...you'd have to set it up this way from the beginning, but using a CMS (content mgmt system) would be a good way to go. It's basically a database for your content, which lets you avoid having it all tangled up with HTML markup, etc.
sycamore
Posts: 6359
Joined: Tue May 08, 2018 12:06 pm

Re: How to download entire contents of a website / blog?

Post by sycamore »

JupiterJones wrote: Thu May 26, 2022 12:12 pm ...The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTML protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.
Nitpick: instead of HTML, I think you meant HTTP.

And the concept of a "webpage" gets complicated rather quickly. Not than anyone other than a software geek really cares.
brandy
Posts: 529
Joined: Thu Mar 15, 2018 9:45 pm

Re: How to download entire contents of a website / blog?

Post by brandy »

"This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?"

I haven't read all the posts. One website that I used almost daily for about 8 years--I it originated about 1998 I think "died" about 2 years ago. It was loaded with easily searchable information that would be hard to come by otherwise, even if one had the books at hand. When it closed, numerous threads and several sections/topics were lost. Postings have been closed for about a year, but some information is still available, and it is still somewhat searchable. That was an incredible resource that I still use at times.
Same thing happened to 2 others. One ran for about 25 years, the other, the main man died and the others drifted away.
User avatar
tuningfork
Posts: 885
Joined: Wed Oct 30, 2013 8:30 pm

Re: How to download entire contents of a website / blog?

Post by tuningfork »

brandy wrote: Thu May 26, 2022 2:53 pm "This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?"

I haven't read all the posts. One website that I used almost daily for about 8 years--I it originated about 1998 I think "died" about 2 years ago. It was loaded with easily searchable information that would be hard to come by otherwise, even if one had the books at hand. When it closed, numerous threads and several sections/topics were lost. Postings have been closed for about a year, but some information is still available, and it is still somewhat searchable. That was an incredible resource that I still use at times.
Same thing happened to 2 others. One ran for about 25 years, the other, the main man died and the others drifted away.
Maybe I should check all my Geocities bookmarks!
User avatar
JupiterJones
Posts: 3623
Joined: Tue Aug 24, 2010 3:25 pm
Location: Nashville, TN

Re: How to download entire contents of a website / blog?

Post by JupiterJones »

sycamore wrote: Thu May 26, 2022 2:43 pm
JupiterJones wrote: Thu May 26, 2022 12:12 pm ...The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTML protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.
Nitpick: instead of HTML, I think you meant HTTP.
Indeed I did! Fixed.
"Stay on target! Stay on target!"
Post Reply