How to download entire contents of a website / blog?
How to download entire contents of a website / blog?
As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Re: How to download entire contents of a website / blog?
Look at the `wget` command. It is available for different platforms. https://www.gnu.org/software/wget/
Re: How to download entire contents of a website / blog?
You should contact the owner of the site to get permission to do this.
If you're comfortable (learning about) programming, the Python library Beautiful Soup works well for web-scraping like this, and there are a lot of beginner-level tutorials on the web.
If you're comfortable (learning about) programming, the Python library Beautiful Soup works well for web-scraping like this, and there are a lot of beginner-level tutorials on the web.
A useful razor: anyone asking about speculative strategies on Bogleheads.org has no business using them.
Re: How to download entire contents of a website / blog?
I am not tech-savvy at all, so programming is out of the question. Is there any downloadable program (happy to pay for it) that will do this for me in a relatively straightforward way?
The content is all in the public domain, and I'm not re-publishing it. It's purely for reading and research.
I forgot to mention that apart from a blog, I'd like to save the entire contents of a Facebook group too but that seems a bit more complicated? Basically people have been posting old photographs, and then various other posters will provide information about it (details about dates, memories, locations, etc.) It's a treasure trove of historical information that's been crowd-sourced over a decade or so. It'd be a pity if one day all of this just gets lost.
The content is all in the public domain, and I'm not re-publishing it. It's purely for reading and research.
I forgot to mention that apart from a blog, I'd like to save the entire contents of a Facebook group too but that seems a bit more complicated? Basically people have been posting old photographs, and then various other posters will provide information about it (details about dates, memories, locations, etc.) It's a treasure trove of historical information that's been crowd-sourced over a decade or so. It'd be a pity if one day all of this just gets lost.
-
- Posts: 155
- Joined: Tue Jun 08, 2021 3:10 am
Re: How to download entire contents of a website / blog?
https://www.httrack.com/Caduceus wrote: ↑Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Re: How to download entire contents of a website / blog?
Google "webwhacker software" . There are quite a few programs available.
- JupiterJones
- Posts: 3623
- Joined: Tue Aug 24, 2010 3:25 pm
- Location: Nashville, TN
Re: How to download entire contents of a website / blog?
Plus a screenshot isn't really saving text. It's saving a picture of the text as it appears to you at that moment. You can't search the text for a keyword or resize the window and have the text reflow, for example, as you would an actual text file.Caduceus wrote: ↑Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Saving the page as a "web archive" or "webpage, complete" or whatever your browser of choice calls it (look under your File menu), would actually preserve a page you're viewing in a way such that you could load it back into your web browser at any future point, and it would look/act just like a webpage. But of course, you'd have to do that with every blog post page as you viewed them--this won't crawl the entire site and download every thing you could possibly read on it.
This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?
"Stay on target! Stay on target!"
Re: How to download entire contents of a website / blog?
+1fisher0815 wrote: ↑Tue May 24, 2022 1:55 pmhttps://www.httrack.com/Caduceus wrote: ↑Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
I use this program all the time to download translations that i do in order to keep a personal archive in case the site ever goes dark.
Easy to use.
~Moshe
My money has no emotions. ~Moshe |
|
I'm the world's greatest expert on my own opinion. ~Bruce Williams
- nisiprius
- Advisory Board
- Posts: 52211
- Joined: Thu Jul 26, 2007 9:33 am
- Location: The terrestrial, globular, planetary hunk of matter, flattened at the poles, is my abode.--O. Henry
Re: How to download entire contents of a website / blog?
On MacOS, I use an inexpensive app named SiteSucker.
It works reasonably well on straightforward websites, where the entire content of the whole website is not too gigantic (not more than a few hundred megabytes, say). I use it primarily to capture my own websites, which are either homemade HTML-and-Javascript-based or Wordpress-based.
It works reasonably well on straightforward websites, where the entire content of the whole website is not too gigantic (not more than a few hundred megabytes, say). I use it primarily to capture my own websites, which are either homemade HTML-and-Javascript-based or Wordpress-based.
Annual income twenty pounds, annual expenditure nineteen nineteen and six, result happiness; Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery.
-
- Posts: 902
- Joined: Tue Mar 27, 2018 5:41 am
Re: How to download entire contents of a website / blog?
Adobe will convert a website to pdf and it should also retain hyperlink functionality. In addition to offering a monthly subscription pricing model they also offer a free trial to check out
https://www.adobe.com/acrobat/how-to/co ... o-pdf.html
https://www.adobe.com/acrobat/how-to/co ... o-pdf.html
- ResearchMed
- Posts: 16795
- Joined: Fri Dec 26, 2008 10:25 pm
Re: How to download entire contents of a website / blog?
I'd love to save our own website, albeit mostly for nostalgic reasons.fisher0815 wrote: ↑Tue May 24, 2022 1:55 pmhttps://www.httrack.com/Caduceus wrote: ↑Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Is this safe? That is, will it leave the current website entirely unchanged?
What is the form of what is saved, and how is the original displayed?
It captures all subsections, such as "Home", "Rates", "Photos", etc.?
And how does it differ from nisiprius' suggestion, SiteSucker (which seems not to be free)?
[How does one determine the size of a website before trying to copy it, if there is a maximum?]
Finally, IF one ever wanted to and the original site was closed, can the code be used easily to re-create the website so that "it's as if it never happened" (apologies to the disaster cleanup company!). This is unlikely in our case, but I like to try to learn the "just in case" possibilities when something could be anticipated, even if it's very unlikely.
Many thanks!
RM
This signature is a placebo. You are in the control group.
Re: How to download entire contents of a website / blog?
Never depend on stuff on the web staying available. In a stroke of luck, years ago I saved an entire website that someone had made of a very extensive family tree that included a lot of my family. At the time but apparently no longer, whatever browser I was using, probably firefox, had a command in a pull down menu to do this. That website has disappeared from the web.JupiterJones wrote: ↑Tue May 24, 2022 2:18 pmPlus a screenshot isn't really saving text. It's saving a picture of the text as it appears to you at that moment. You can't search the text for a keyword or resize the window and have the text reflow, for example, as you would an actual text file.Caduceus wrote: ↑Tue May 24, 2022 1:04 pm As part of research for a book I'm writing, I sometimes come across blogs by people with lots of information. Does anyone know if there is an easy way that I can download the entire contents of the blog for offline reading in the future (I can't read 10 years' worth of postings anytime soon)?
I've been using Firefox's "Screenshot" Feature but it's very cumbersome to save text this way.
Saving the page as a "web archive" or "webpage, complete" or whatever your browser of choice calls it (look under your File menu), would actually preserve a page you're viewing in a way such that you could load it back into your web browser at any future point, and it would look/act just like a webpage. But of course, you'd have to do that with every blog post page as you viewed them--this won't crawl the entire site and download every thing you could possibly read on it.
This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?
I used to participate in a yahoo group for a medical condition. It had a wealth of information. Yahoo wiped its groups off the web later.
-
- Posts: 1214
- Joined: Thu Apr 22, 2021 3:29 pm
Re: How to download entire contents of a website / blog?
Web servers don't like to get hammered and can impose limitations or deny access in the face of too many perceived "bot attacks". For commands like Curl and Wget, I believe you can slow down the download process to avoid looking like a bot. It appears from the Httrack documentation that there are similar settings that can be used. Use them! Otherwise download restrictions can be placed on these websites down the line.
From: https://www.httrack.com/html/abuse.html
From: https://www.httrack.com/html/abuse.html
Downloading a site can overload it, if you have a fast pipe, or if you capture too many simultaneous cgi (dynamically generated pages).
Do not download too large websites: use filters
Do not use too many simultaneous connections
Use bandwidth limits
Use connection limits
Use size limits
Use time limits
Only disable robots.txt rules with great care
Try not to download during working hours
Check your mirror transfer rate/size
For large mirrors, first ask the webmaster of the site
-
- Posts: 901
- Joined: Sat Mar 03, 2007 4:30 pm
Re: How to download entire contents of a website / blog?
I install Adobe Acrobat 9 from an old copy. On Windows 10 I can capture a web site down to as many levels as necessary.
There is also this solution, just tried: https://cloudconvert.com/save-website-pdf
I haven't looked at the options.
There is also this solution, just tried: https://cloudconvert.com/save-website-pdf
I haven't looked at the options.
Re: How to download entire contents of a website / blog?
As an owner of a website, if I saw someone doing this (I have metering software on my site), I’d block their IP address from further accessing my site. I check for this hourly and block IP addresses several times a week, mostly from hacker bots trying to download my database.
But, if someone asked me questions, I’d answer then. If someone wanted copies of entries, I’d probably provide them (depends on why and proper attribution). If someone wanted the entire contents of my website, I’m not sure how I’d respond, but it would much nicer to someone who asked first.
By the way, is this a free blog or do you pay for access? Copying an entire website without asking first is, in my opinion, sneaky, unethical, and a violation of common decency. Are there ads that support the site? If so, downloading the whole site might even take money out of the owner’s pocket.
But, if someone asked me questions, I’d answer then. If someone wanted copies of entries, I’d probably provide them (depends on why and proper attribution). If someone wanted the entire contents of my website, I’m not sure how I’d respond, but it would much nicer to someone who asked first.
By the way, is this a free blog or do you pay for access? Copying an entire website without asking first is, in my opinion, sneaky, unethical, and a violation of common decency. Are there ads that support the site? If so, downloading the whole site might even take money out of the owner’s pocket.
No matter how long the hill, if you keep pedaling you'll eventually get up to the top.
Re: How to download entire contents of a website / blog?
For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.
As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.
Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.
Caduceus - Please provide website links that you have downloaded content from.
(This thread was temporarily removed for moderator review.)
As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.
Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.
Caduceus - Please provide website links that you have downloaded content from.
(This thread was temporarily removed for moderator review.)
Re: How to download entire contents of a website / blog?
You can use wget fairly easily for blogs - it will follow links. Another way to save your research would be to use something like the Chrome Single File extension and use Auto Save - Auto Save All Tabs. This is great if you need the backup for whatever you are researching, it automatically saves everything you view so it may eat up gigabytes of space a day.
Re: How to download entire contents of a website / blog?
As someone who has had a non-public discussion forum content scraped and the content sold for the benefit of the scraper (who was a moderator of that forum at the time), I agree with the above and strongly recommend contacting the website/blog owner first.Raybo wrote: ↑Tue May 24, 2022 6:10 pm As an owner of a website, if I saw someone doing this (I have metering software on my site), I’d block their IP address from further accessing my site. I check for this hourly and block IP addresses several times a week, mostly from hacker bots trying to download my database.
But, if someone asked me questions, I’d answer then. If someone wanted copies of entries, I’d probably provide them (depends on why and proper attribution). If someone wanted the entire contents of my website, I’m not sure how I’d respond, but it would much nicer to someone who asked first.
By the way, is this a free blog or do you pay for access? Copying an entire website without asking first is, in my opinion, sneaky, unethical, and a violation of common decency. Are there ads that support the site? If so, downloading the whole site might even take money out of the owner’s pocket.
Re: How to download entire contents of a website / blog?
I use Site Sucker
Old fart who does three index stock funds, baby.
-
- Posts: 1093
- Joined: Sun Dec 22, 2019 2:24 am
- ResearchMed
- Posts: 16795
- Joined: Fri Dec 26, 2008 10:25 pm
Re: How to download entire contents of a website / blog?
We would very much like to know how to do this for the entire website for *our* website!LadyGeek wrote: ↑Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.
As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.
Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.
Caduceus - Please provide website links that you have downloaded content from.
(This thread was temporarily removed for moderator review.)
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)
I hope this thread stays open.
RM
This signature is a placebo. You are in the control group.
-
- Posts: 5774
- Joined: Mon Sep 22, 2014 4:47 pm
Re: How to download entire contents of a website / blog?
I've done this once or twice with Acrobat. You can select how deep in the links it will go.LookinAround wrote: ↑Tue May 24, 2022 3:08 pm Adobe will convert a website to pdf and it should also retain hyperlink functionality. In addition to offering a monthly subscription pricing model they also offer a free trial to check out
https://www.adobe.com/acrobat/how-to/co ... o-pdf.html
Be careful what you wish for, there might be a LOT of data.
-
- Posts: 16054
- Joined: Fri Nov 06, 2020 12:41 pm
Re: How to download entire contents of a website / blog?
I've used this thing with mixed success: https://www.httrack.com/
- ResearchMed
- Posts: 16795
- Joined: Fri Dec 26, 2008 10:25 pm
Re: How to download entire contents of a website / blog?
What worked, and what didn't?Marseille07 wrote: ↑Tue May 24, 2022 10:31 pm I've used this thing with mixed success: https://www.httrack.com/
We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia...
Or would that just copy each long page?
RM
This signature is a placebo. You are in the control group.
-
- Posts: 16054
- Joined: Fri Nov 06, 2020 12:41 pm
Re: How to download entire contents of a website / blog?
? Just try it.ResearchMed wrote: ↑Tue May 24, 2022 10:36 pm What worked, and what didn't?
We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia...
Or would that just copy each long page?
RM
The issue was it was slow to concurrently fetch pages at that time, because they capped the # of parallel connections (if you concurrently fetch hundreds of resources, the webserver can be under heavy load).
Otherwise it worked reasonably well.
Re: How to download entire contents of a website / blog?
IANAL but my understanding is downloading a website without permission technically violates copyright law. But it's very unlikely anyone will notice or care if you do. Still, if you want to be a stickler...
Re: How to download entire contents of a website / blog?
I believe most "web whackers" just create a local copy of the "content pages" for you wherever you choose to put it and tell the webwhacker program. To access it, just point your browser to wherever it is stored. For example c:/mystoredwebpages/firstwebsite etc. instead of the normal URL for the website. Just for laughs, you can press CTRL U while viewing any webpage to see what the code looks like (usually an incredible amount of data that scrolls on and on). In the very early days of the web, all that code was created by a developer typing in HTML commands. Today it is generated by web development software.ResearchMed wrote: ↑Tue May 24, 2022 10:36 pmWhat worked, and what didn't?Marseille07 wrote: ↑Tue May 24, 2022 10:31 pm I've used this thing with mixed success: https://www.httrack.com/
We've got a website with about 6 different sub-pages. Each one is rather long if one scrolls all the way through, but not ridiculously long.
I'd love to be able to somehow re-create the "full monty" somehow.... being able to have a link to it (not public, but something we could use) so that we could, for example, send it to someone to show just what our website had been like.
... or look at it intact, perhaps with some nostalgia...
Or would that just copy each long page?
RM
- quantAndHold
- Posts: 10141
- Joined: Thu Sep 17, 2015 10:39 pm
- Location: West Coast
Re: How to download entire contents of a website / blog?
Is there specific verbiage on the website saying that it’s in the public domain? Because otherwise, it isn’t. I’m going to go out on a limb and guess that this is a website that has historical documents or photos or something. Those might be in the public domain, but the website as a whole (the layout, or anything that the author wrote) is copyrighted.
My recommendation would be to contact the author of the site. If you have a legitimate use for the material, they would probably be willing to share.
My recommendation would be to contact the author of the site. If you have a legitimate use for the material, they would probably be willing to share.
Re: How to download entire contents of a website / blog?
The original site owners are very much around - which includes "our" website creator. Don't forget that this entire website (along with the entire internet) is archived at Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine. Enter a link in the Wayback machine to find the version which existed on a particular date.ResearchMed wrote: ↑Tue May 24, 2022 9:44 pmWe would very much like to know how to do this for the entire website for *our* website!LadyGeek wrote: ↑Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.
As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.
Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.
Caduceus - Please provide website links that you have downloaded content from.
(This thread was temporarily removed for moderator review.)
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)
I hope this thread stays open.
RM
- ResearchMed
- Posts: 16795
- Joined: Fri Dec 26, 2008 10:25 pm
Re: How to download entire contents of a website / blog?
[bolded emphasis added]LadyGeek wrote: ↑Wed May 25, 2022 6:02 pmThe original site owners are very much around - which includes "our" website creator. Don't forget that this entire website (along with the entire internet) is archived at Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine. Enter a link in the Wayback machine to find the version which existed on a particular date.ResearchMed wrote: ↑Tue May 24, 2022 9:44 pmWe would very much like to know how to do this for the entire website for *our* website!LadyGeek wrote: ↑Tue May 24, 2022 8:05 pm For the record, discussions of dishonest behavior or bypassing the law is totally unacceptable.
As long as the OP has stated that the content is in the public domain and / or complies with the copyright restrictions, this discussion can continue. A google search for "copyright personal use" provides situations where this is permitted. However, I am not a lawyer.
Also bear in mind that lack of a visible copyright notice does not mean that content is in the public domain.
Caduceus - Please provide website links that you have downloaded content from.
(This thread was temporarily removed for moderator review.)
(Not just to code, but the displays, "as is". Plus, our website creator is long gone...)
I hope this thread stays open.
RM
I'm not sure if there was some confusion, but my use of "OUR website" did indeed refer to the website that DH and I own for our vacation rental business. It was an active business until not too long ago.
(It has nothing to do with the BH website.)
The "owners" are still very much around: That would be "the two of us".
However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).
And at some point, we'll take *our* website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).
RM
This signature is a placebo. You are in the control group.
Re: How to download entire contents of a website / blog?
^^^ No problem. Thanks for the clarification.
-
- Posts: 2414
- Joined: Sat Jun 27, 2020 4:05 pm
- JupiterJones
- Posts: 3623
- Joined: Tue Aug 24, 2010 3:25 pm
- Location: Nashville, TN
Re: How to download entire contents of a website / blog?
Really? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.
Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.
Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
"Stay on target! Stay on target!"
Re: How to download entire contents of a website / blog?
How is this different than checking a book out of a library and then photocopying every page “for home use only?” The book isn’t in the public domain just because it is available for free from the library.JupiterJones wrote: ↑Wed May 25, 2022 7:17 pmReally? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.
Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.
Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
Most websites have an explicit copyright notice on each page.
I still contend that doing this without asking is a sneaky thing to do and shouldn’t be done.
No matter how long the hill, if you keep pedaling you'll eventually get up to the top.
-
- Posts: 16054
- Joined: Fri Nov 06, 2020 12:41 pm
Re: How to download entire contents of a website / blog?
My confusion is why you haven't tried one of the suggestions to archive your site.ResearchMed wrote: ↑Wed May 25, 2022 6:36 pm I'm not sure if there was some confusion, but my use of "OUR website" did indeed refer to the website that DH and I own for our vacation rental business. It was an active business until not too long ago.
(It has nothing to do with the BH website.)
The "owners" are still very much around: That would be "the two of us".
However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).
And at some point, we'll take *our* website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).
RM
Re: How to download entire contents of a website / blog?
Do you have the login credentials (username & password) for your site at the web-hosting provider? They should have a way to make a backup copy of all the files on your site and download it.ResearchMed wrote: ↑Wed May 25, 2022 6:36 pm However, and unfortunately, our website creator is indeed long gone (or we'd have asked him to handle this for us).
And at some point, we'll take *our* website down and save the rather modest annual fee needed keep it up. But I'd like to be able to re-create it "as it is now" in the future, as much as possible, if we wanted to do so (e.g., for nostalgic reasons or to show someone).
My personal hobby site is hosted by Namecheap. They use the "cPanel" interface for site owners (or their "web guru") to maintain their sites. I understand it's a very common interface. It can do backups and download them. If your provider uses that or something similar, you should be able to find someone who can use it with your authorization. You might even be able to do it yourself with help from your provider's tech support.
(I've never tried to use the backup feature myself. My site is all hand-coded HTML/CSS/PHP and image files, with no database. I keep a complete "mirror" of it on my local computer, with files and folders laid out exactly as on the web server. When I make changes to the site, I edit the "mirror" files locally, then upload them to the server using the "file manager" in the "cPanel" interface. So the "mirror" acts as my backup.)
Meet my pet, Peeve, who loves to convert non-acronyms into acronyms: FED, ROTH, CASH, IVY, ...
-
- Posts: 2414
- Joined: Sat Jun 27, 2020 4:05 pm
Re: How to download entire contents of a website / blog?
The difference is that in order to view a website you have to make a copy. (This type of thing has actually been litigated in the past, see https://en.wikipedia.org/wiki/MAI_Syste ... uter,_Inc. where a computer repair company was successfully sued for turning on a clients computer because loading programs into RAM was considered copyright infringement. The law in that particular case has since changed.)Raybo wrote: ↑Wed May 25, 2022 11:07 pmHow is this different than checking a book out of a library and then photocopying every page “for home use only?” The book isn’t in the public domain just because it is available for free from the library.JupiterJones wrote: ↑Wed May 25, 2022 7:17 pmReally? IANALE, but that strikes me as a bit odd, since you technically "download" a web page every time you view one. Heck, unless you're emptying your cache constantly, you're also saving a copy of it on your hard drive. From the standpoint of the web server, there's no technical difference between viewing a web page and saving a copy of it.
Granted, if you saved the entirety of a large website, there could be a large server hit. But a well-behaved and courteous bot wouldn't hit the website much harder than a human viewer would. There's no law against a (very patient) human viewing the entirety of a large website as far as I know.
Now posting any of what you downloaded on another site, well that's where I'd imagine copyright issues would come strongly into play!
This is more akin to borrowing an e-book from the library and not deleting it after the lending period is over.
- JupiterJones
- Posts: 3623
- Joined: Tue Aug 24, 2010 3:25 pm
- Location: Nashville, TN
Re: How to download entire contents of a website / blog?
Right. When you borrow a book, you take physical possession of it and prevent anyone else from also borrowing and reading it until you give it back. The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTTP protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.AnEngineer wrote: ↑Thu May 26, 2022 7:39 amThe difference is that in order to view a website you have to make a copy. [...]
This is more akin to borrowing an e-book from the library and not deleting it after the lending period is over.
In fact, I would say that it's not quite like not deleting a borrowed e-book after the lending period, because there is no lending period for a web page.
But again, IANAL, so I'm only speaking to the rationality/sensibility of the moral argument for saving the copy of a web page (that you must necessarily have in order to view it) for later reference, not the legality.
Last edited by JupiterJones on Thu May 26, 2022 4:11 pm, edited 1 time in total.
"Stay on target! Stay on target!"
Re: How to download entire contents of a website / blog?
It doesn't. It's unlikely fair use would allow using 100% of a copyrighted work, especially not for purposes of writing a book (a commercial endeavor). Also, different countries have different laws.
-
- Posts: 2414
- Joined: Sat Jun 27, 2020 4:05 pm
Re: How to download entire contents of a website / blog?
I wasn't claiming that fair use definitely allows the copying here, merely that it needs to be considered.KyleAAA wrote: ↑Thu May 26, 2022 12:33 pmIt doesn't. It's unlikely fair use allow using 100% of a copyrighted work.
But fair use does allow using 100% of a copyrighted work in some cases, here is one: https://www.copyright.gov/fair-use/summ ... ir2014.pdf.
Re: How to download entire contents of a website / blog?
I haven't been an Evernote user in quite a while, but they do have a Web Clipper feature. (I haven't used it myself; I just prefer to link/bookmark to the live site usually.)
For the side point about how to preserve info from your own site...you'd have to set it up this way from the beginning, but using a CMS (content mgmt system) would be a good way to go. It's basically a database for your content, which lets you avoid having it all tangled up with HTML markup, etc.
For the side point about how to preserve info from your own site...you'd have to set it up this way from the beginning, but using a CMS (content mgmt system) would be a good way to go. It's basically a database for your content, which lets you avoid having it all tangled up with HTML markup, etc.
Re: How to download entire contents of a website / blog?
Nitpick: instead of HTML, I think you meant HTTP.JupiterJones wrote: ↑Thu May 26, 2022 12:12 pm ...The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTML protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.
And the concept of a "webpage" gets complicated rather quickly. Not than anyone other than a software geek really cares.
Re: How to download entire contents of a website / blog?
"This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?"
I haven't read all the posts. One website that I used almost daily for about 8 years--I it originated about 1998 I think "died" about 2 years ago. It was loaded with easily searchable information that would be hard to come by otherwise, even if one had the books at hand. When it closed, numerous threads and several sections/topics were lost. Postings have been closed for about a year, but some information is still available, and it is still somewhat searchable. That was an incredible resource that I still use at times.
Same thing happened to 2 others. One ran for about 25 years, the other, the main man died and the others drifted away.
I haven't read all the posts. One website that I used almost daily for about 8 years--I it originated about 1998 I think "died" about 2 years ago. It was loaded with easily searchable information that would be hard to come by otherwise, even if one had the books at hand. When it closed, numerous threads and several sections/topics were lost. Postings have been closed for about a year, but some information is still available, and it is still somewhat searchable. That was an incredible resource that I still use at times.
Same thing happened to 2 others. One ran for about 25 years, the other, the main man died and the others drifted away.
- tuningfork
- Posts: 885
- Joined: Wed Oct 30, 2013 8:30 pm
Re: How to download entire contents of a website / blog?
Maybe I should check all my Geocities bookmarks!brandy wrote: ↑Thu May 26, 2022 2:53 pm "This may sound like a goofy question, but couldn't you just bookmark the site so you could come back to it when you wanted to read it? Is there a reason to believe that the blog won't be around by whatever future point you expect to be wanting to read it?"
I haven't read all the posts. One website that I used almost daily for about 8 years--I it originated about 1998 I think "died" about 2 years ago. It was loaded with easily searchable information that would be hard to come by otherwise, even if one had the books at hand. When it closed, numerous threads and several sections/topics were lost. Postings have been closed for about a year, but some information is still available, and it is still somewhat searchable. That was an incredible resource that I still use at times.
Same thing happened to 2 others. One ran for about 25 years, the other, the main man died and the others drifted away.
- JupiterJones
- Posts: 3623
- Joined: Tue Aug 24, 2010 3:25 pm
- Location: Nashville, TN
Re: How to download entire contents of a website / blog?
Indeed I did! Fixed.sycamore wrote: ↑Thu May 26, 2022 2:43 pmNitpick: instead of HTML, I think you meant HTTP.JupiterJones wrote: ↑Thu May 26, 2022 12:12 pm ...The act of going to a website is fundamentally asking for your own copy of the underlying code of a webpage. That's literally how the HTML protocol works. And it doesn't prevent anyone else from asking for and getting their own copy either. Nothing is "borrowed" or consumed at any point.
"Stay on target! Stay on target!"