Long(ish) term storage for computer files?

Questions on how we spend our money and our time - consumer goods and services, home and vehicle, leisure and recreational activities
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

Northern Flicker wrote: Wed Aug 17, 2022 1:14 pm If files archived for 40 years are encrypted, how and where will you store the encryption key, and what media will it use for 40-year storage?
Physical media should never be trusted for 40 years, and in fact it can fail at any time for any reason. Keep at least two separate physical copies and "refresh" it every 5-10 years by copying it to a new media. This will also protect you from technological obsolescence, how many people still have a floppy drive or tape reader on their computer?
This is what I did since my first PC in 1991 and I never lost any data (although I can no longer read papers which I wrote in 1994 with now defunct word processors -- a good reason to also have copies in plain text along with the original document if possible. Hopefully today's PDF documents will still be readable 40 years from now, but who knows).
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
Northern Flicker
Posts: 15363
Joined: Fri Apr 10, 2015 12:29 am

Re: Long(ish) term storage for computer files?

Post by Northern Flicker »

What is the 40-year storage plan for the password safe, and how is the master key for the password safe maintained?
User avatar
bampf
Posts: 1104
Joined: Thu Aug 04, 2016 6:19 pm

Re: Long(ish) term storage for computer files?

Post by bampf »

I have 30 years in the storage industry. I manage millions of files and am in a heavily regulated industry. I have built (engineered, created etc) storage around disks, SSDs, CDs, DVDs, tapes and some esoteric solutions. I have built file, block and object estates. I have built archives and backups and file systems.

Do not over think this.

1. Create an encrypted solution if you are worried (I am not).
2. Keep a copy locally.
3. Put one copy on your google drive.
4. Put one copy on your Amazon Prime storage space. Or AWS.
(Or dropbox. Or OneDrive. Or any of a dozen "small storage solutions" in the cloud. Or one on your mac book. Or one on your Linux box. On one on the file system on your phone). This is to indicate that 3 and 4 are fungible as long as they are off prem and preferably managed at the enterprise.

Tax returns aren't going to take a lot of space. Simple encryption is likely good enough. 3 copies. 1 local. 2 off site. 2 different off site solutions. Keep it under a TB and you can likely get that for free.

My opinion for what it is worth.

--Bampf
Nescio
Northern Flicker
Posts: 15363
Joined: Fri Apr 10, 2015 12:29 am

Re: Long(ish) term storage for computer files?

Post by Northern Flicker »

bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
User avatar
bampf
Posts: 1104
Joined: Thu Aug 04, 2016 6:19 pm

Re: Long(ish) term storage for computer files?

Post by bampf »

Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
Nescio
User avatar
enad
Posts: 1581
Joined: Fri Aug 12, 2022 2:50 pm

Re: Long(ish) term storage for computer files?

Post by enad »

Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
For grins, I just fired up an old IDE hard drive from the 90's and was able to access many files and folders. The drive has been stored in a cool dry safe for more than 24 years. I haven't tried booting Windows 3.1 but I might. I also fired up my 1980 CP/M system that uses 8" double-sided 1.2 MB floppy disks. Imagine the entire OS fits on the disk and still gave me space to store files. I was able to boot up a disk and play Space Invaders. I have a 5.25" Floppy drives(in a box, last used 5 years ago) and both 3.5" Teac 1.2 MB floppy drive (floppy interface one PC) and the USB version that can be used on modern PC's. The data on floppies written in the mid to late 90's is still there and reading it was not an issue nor viewing some photographs that I should probably put on our NAS. I have since lost the ability to read my 1982 mag tape but the last time I read it I was able to dump the contents of all the files (so there is no loss, just never got around to disposing of it properly). Now the DEC tapes are another story ...

No matter what media you choose there is always someone or a service that can read that media and transfer to another format. I really like dumping the stuff on flash drives, labeling said flash drives and rotating them every 3-6 months and in the meantime lock them up in a fireproof safe. I don't trust others to keep track of my sensitive financial files and that's what happens when you put it in the cloud or someone else's service. You may never hear of the thefts that take place; if you did they'd go out of business.
What Goes Up Must come down -- David Clayton-Thomas (1968), BST
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

enad wrote: Thu Aug 18, 2022 7:18 pm
Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
For grins, I just fired up an old IDE hard drive from the 90's and was able to access many files and folders. The drive has been stored in a cool dry safe for more than 24 years. I haven't tried booting Windows 3.1 but I might. I also fired up my 1980 CP/M system that uses 8" double-sided 1.2 MB floppy disks. Imagine the entire OS fits on the disk and still gave me space to store files. I was able to boot up a disk and play Space Invaders. I have a 5.25" Floppy drives(in a box, last used 5 years ago) and both 3.5" Teac 1.2 MB floppy drive (floppy interface one PC) and the USB version that can be used on modern PC's. The data on floppies written in the mid to late 90's is still there and reading it was not an issue nor viewing some photographs that I should probably put on our NAS. I have since lost the ability to read my 1982 mag tape but the last time I read it I was able to dump the contents of all the files (so there is no loss, just never got around to disposing of it properly). Now the DEC tapes are another story ...

No matter what media you choose there is always someone or a service that can read that media and transfer to another format. I really like dumping the stuff on flash drives, labeling said flash drives and rotating them every 3-6 months and in the meantime lock them up in a fireproof safe. I don't trust others to keep track of my sensitive financial files and that's what happens when you put it in the cloud or someone else's service. You may never hear of the thefts that take place; if you did they'd go out of business.
I started in '94 with IDE hard drives, which I then copied to parallel SCSI hard drives in '96, then copied it over to SATA HDD in 2003, and last but no least to SSD once they got cheap enough (still using SATA procotol).
I do have date on a 1/4 inch tape in UNIX TAR format made in 1983 PDP11/70, but I have no hardware to read it back and it's likely degraded.

In short my strategy is to keep copying over the data to the current widely used format so avoid this kind of technical obsolescence.

There is also the issue of data formats. I can no longer read the term papers I wrote in '95 with a now defunct word processor. I realized this 10 years ago, so now I always export word processor to ASCII text, it's as close as it gets to a universal data format.
PDF is so pervasive now on government documents, official docs etc etc that it's unlikely to become obsolete, but if a new format ever comes out (say NEW-PDF), I'll simply convert all PDF docs to NEW-PDF.

@enad it's really cool you managed to keep old hardware :-) unfortunately I ran out of space as I had all kinds of stuff at home due to my work as software/hardware eng, so it wasn't practical for me. Although if I had actually owned a 1980 CP/M or the IBM PC XT I would have kept both :-)

Rember QIC 24 tape drives? those were fun :-)
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
Northern Flicker
Posts: 15363
Joined: Fri Apr 10, 2015 12:29 am

Re: Long(ish) term storage for computer files?

Post by Northern Flicker »

bampf wrote: Thu Aug 18, 2022 5:26 pm
Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
User avatar
bampf
Posts: 1104
Joined: Thu Aug 04, 2016 6:19 pm

Re: Long(ish) term storage for computer files?

Post by bampf »

Northern Flicker wrote: Thu Aug 18, 2022 11:00 pm
bampf wrote: Thu Aug 18, 2022 5:26 pm
Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
Oh, I see. Well, I guess I was telling someone that it isn't all that hard to have a resilient, cheap and easy to use system for archival data sets using tools at hand. I apologize if my tone was off putting, I was trying to say that you can go to the very very deep ends in storage and maybe you don't need to. Wasn't trying to sound arrogant, but I can see how that may have come across that way.

For what it is worth, I have built storage and operating systems that do have confidentiality requirements. I am pretty familiar with FIPS 140 and it is perhaps ok to have a reasonably encrypted container for things like tax returns. You wouldn't get a lot of exciting information from a tax return if you were a bad actor.

So, setting aside my tone, my recommendation to do something that is simple, sustainable and cost effective stands.
Nescio
Northern Flicker
Posts: 15363
Joined: Fri Apr 10, 2015 12:29 am

Re: Long(ish) term storage for computer files?

Post by Northern Flicker »

Not off-putting at all.

My solution for tax returns is hard copy in a locked filing cabinet.

I consider any solution involving encrypted data to have too much risk of not being able to decrypt when I need the data. One bad sector or flash memory cell in an encrypted file can cause loss of most of, or the entire file, and not just the data that was corrupted.
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

Pretty cool I see other storage folks.
I was in the initial team about 20 engineers who built AWS EBS, I did a lot of work in the kernel code, then at Twitter I also wrote another virtual driver to use NVMe to cache spinning media.
At Microsoft I was the clustering team working on the shared SCSI store.
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

bampf wrote: Thu Aug 18, 2022 11:50 pm
Northern Flicker wrote: Thu Aug 18, 2022 11:00 pm
bampf wrote: Thu Aug 18, 2022 5:26 pm
Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
bampf wrote: Thu Aug 18, 2022 1:22 am I have 30 years in the storage industry.
How many of the files you managed 30 years ago are accessible today?
That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
Oh, I see. Well, I guess I was telling someone that it isn't all that hard to have a resilient, cheap and easy to use system for archival data sets using tools at hand. I apologize if my tone was off putting, I was trying to say that you can go to the very very deep ends in storage and maybe you don't need to. Wasn't trying to sound arrogant, but I can see how that may have come across that way.

For what it is worth, I have built storage and operating systems that do have confidentiality requirements. I am pretty familiar with FIPS 140 and it is perhaps ok to have a reasonably encrypted container for things like tax returns. You wouldn't get a lot of exciting information from a tax return if you were a bad actor.

So, setting aside my tone, my recommendation to do something that is simple, sustainable and cost effective stands.
We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
User avatar
bampf
Posts: 1104
Joined: Thu Aug 04, 2016 6:19 pm

Re: Long(ish) term storage for computer files?

Post by bampf »

squirrel1963 wrote: Fri Aug 19, 2022 2:45 am
bampf wrote: Thu Aug 18, 2022 11:50 pm
Northern Flicker wrote: Thu Aug 18, 2022 11:00 pm
bampf wrote: Thu Aug 18, 2022 5:26 pm
Northern Flicker wrote: Thu Aug 18, 2022 3:04 am
How many of the files you managed 30 years ago are accessible today?
That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
Oh, I see. Well, I guess I was telling someone that it isn't all that hard to have a resilient, cheap and easy to use system for archival data sets using tools at hand. I apologize if my tone was off putting, I was trying to say that you can go to the very very deep ends in storage and maybe you don't need to. Wasn't trying to sound arrogant, but I can see how that may have come across that way.

For what it is worth, I have built storage and operating systems that do have confidentiality requirements. I am pretty familiar with FIPS 140 and it is perhaps ok to have a reasonably encrypted container for things like tax returns. You wouldn't get a lot of exciting information from a tax return if you were a bad actor.

So, setting aside my tone, my recommendation to do something that is simple, sustainable and cost effective stands.
We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
There are like 2000 of us in the world, we just rotate companies every once in a while... <exchange secret storage handshake>
Nescio
Northern Flicker
Posts: 15363
Joined: Fri Apr 10, 2015 12:29 am

Re: Long(ish) term storage for computer files?

Post by Northern Flicker »

squirrel1963 wrote: We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
Encryption of database backups usually are all or nothing, not individual files. System administrators are rarely asked to restore from a 30 or 40 year-old backup, if ever. Most solutions in place do not have (or meet) requirements like that, and so it is easy to gloss over the subtleties involved.

It of course depends on what the OP means by longish.
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

Northern Flicker wrote: Fri Aug 19, 2022 12:21 pm
squirrel1963 wrote: We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
Encryption of database backups usually are all or nothing, not individual files. System administrators are rarely asked to restore from a 30 or 40 year-old backup, if ever. Most solutions in place do not have (or meet) requirements like that, and so it is easy to gloss over the subtleties involved.

It of course depends on what the OP means by longish.
Sure that makes perfect sense, a database backup must be consistent at the whole DB level. Both S3 and EBS volumes are really "large blob objects", an S3 file can be as small as a 5 Kbytes text document to a large tarball containing thousands of files to a whole disk volume backup. EBS actually actually stores volume backups in relatively small chunks (used to be 4 Mbytes long ago, not sure now) because it helps in reducing the size of point-in-time snapshots.
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
User avatar
squirrel1963
Posts: 1253
Joined: Wed Jun 21, 2017 10:12 am
Location: Portland OR area

Re: Long(ish) term storage for computer files?

Post by squirrel1963 »

bampf wrote: Fri Aug 19, 2022 8:03 am
squirrel1963 wrote: Fri Aug 19, 2022 2:45 am
bampf wrote: Thu Aug 18, 2022 11:50 pm
Northern Flicker wrote: Thu Aug 18, 2022 11:00 pm
bampf wrote: Thu Aug 18, 2022 5:26 pm

That is a complex question and not something I could readily answer. I suppose that the ecosystem I built for the library of congress is still doing what it is supposed to be doing. Certainly the physical media in the form of CDs and Tapes are theoretically accessible. I have pictures that I took over 30 years ago that have been digitized (since roughly 97) that are still accessible.

If you are asking "Is there a long term archive on line that has never gone through a technical change that is still up and running today" I would say likely not. We have come a long long way in 30 years. What are you actually asking about?
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
Oh, I see. Well, I guess I was telling someone that it isn't all that hard to have a resilient, cheap and easy to use system for archival data sets using tools at hand. I apologize if my tone was off putting, I was trying to say that you can go to the very very deep ends in storage and maybe you don't need to. Wasn't trying to sound arrogant, but I can see how that may have come across that way.

For what it is worth, I have built storage and operating systems that do have confidentiality requirements. I am pretty familiar with FIPS 140 and it is perhaps ok to have a reasonably encrypted container for things like tax returns. You wouldn't get a lot of exciting information from a tax return if you were a bad actor.

So, setting aside my tone, my recommendation to do something that is simple, sustainable and cost effective stands.
We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
There are like 2000 of us in the world, we just rotate companies every once in a while... <exchange secret storage handshake>
Oh yes, we are a small community indeed. I have jumped through a lot of companies and very often I kept working with the same people, we all refer each other as we change jobs. This is fairly common for highly specialized fields like kernel coding. My secret handshake is "SCSI page 0x83" :-)
LMP | Liability Matching Portfolio | safe portfolio: TIPS ladder + I-bonds + Treasuries | risky portfolio: US stocks / US REIT / International stocks
User avatar
bampf
Posts: 1104
Joined: Thu Aug 04, 2016 6:19 pm

Re: Long(ish) term storage for computer files?

Post by bampf »

squirrel1963 wrote: Fri Aug 19, 2022 5:08 pm
bampf wrote: Fri Aug 19, 2022 8:03 am
squirrel1963 wrote: Fri Aug 19, 2022 2:45 am
bampf wrote: Thu Aug 18, 2022 11:50 pm
Northern Flicker wrote: Thu Aug 18, 2022 11:00 pm
The question is not whether some version of a file is online today. The online file is just a container for one (or sometimes more) versions of the file. The question is whether you can retrieve the version of the contents of file from 30 years ago from backup or archival media.

I assume the Library of Congress has systems with archival requirements. But I assume they don't just tell developers not to overthink that either. Library archives also may not have confidentiality requirements.
Oh, I see. Well, I guess I was telling someone that it isn't all that hard to have a resilient, cheap and easy to use system for archival data sets using tools at hand. I apologize if my tone was off putting, I was trying to say that you can go to the very very deep ends in storage and maybe you don't need to. Wasn't trying to sound arrogant, but I can see how that may have come across that way.

For what it is worth, I have built storage and operating systems that do have confidentiality requirements. I am pretty familiar with FIPS 140 and it is perhaps ok to have a reasonably encrypted container for things like tax returns. You wouldn't get a lot of exciting information from a tax return if you were a bad actor.

So, setting aside my tone, my recommendation to do something that is simple, sustainable and cost effective stands.
We used encryption in EBS volumes and S3 backups, and it's awesome because the key vault is done really well and it's not a pain to manage.
I don't encrypt the full backup volume too painful because then I'd need to manage the AES keys, just encrypt the files which have sensitive data with a master passphrase, and the backup is in a safe box anyway so it's good enough for me.
There are like 2000 of us in the world, we just rotate companies every once in a while... <exchange secret storage handshake>
Oh yes, we are a small community indeed. I have jumped through a lot of companies and very often I kept working with the same people, we all refer each other as we change jobs. This is fairly common for highly specialized fields like kernel coding. My secret handshake is "SCSI page 0x83" :-)
Yup. All you have to do is update the VID/PID when you get a new gig. If you know, you know.

kill -9 and I am out.
Nescio
Post Reply