Everything not saved will be lost - Archival Strategies for the Everyman, a discussion

  • Thread starter h00
  • Start date
  • This thread has been viewed 1588 times.

h00

message is what matters
Gold
Joined
Apr 15, 2022
Messages
612
Reaction score
2,858
Awards
216
Website
h00.neocities.org
I have been shitposting and trolling a bit too much recently, to balance things out here is an effort post

The duality of the net, somehow "The Internet is Forever" but also very ephemeral.
Our playlists look like this
1684940592444.png
1682557232944-0.jpg

Hosting sites frequently pull the plug on old content
1684940666839.png

Leading to vast linkrot
iu

untitled.png

The internet might remember your cringy facebook or racist twitter posts, but it will easily forget old stop-motion bionicle videos and all your image uploads for the past decade.

If you like something, SAVE IT.
If you want to share something, HOST IT.

This post will examine and encourage (easy) ways to download, store, and share media you interact with.
This is not comprehensive, but it should serve as a good starting point for most travelers.

ARCHIVING
Youtube Videos:
Videos, Playlists, and entire channels can easily be archived with yt-dlp. It comes in a command line and GUI flavor
If you ever need help using the CLI version just ask god
1684942039614.png

Entire Websites
You can use wget (CLI) or httrack (GUI) to download entire websites. You can set it to download external resources as well (such as imgur links) or things just hosted directly on the site.
Code:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent $URL_HERE
Static sites, imageboards, and forums such as this one don't take up a terrible amount of space. IIRC Agora is ~200GB, Lainchan is just 13GB, and sites like textfiles.com less than that.

If a community is ever shutdown, having an archive of it's content, even if its a few months old, is incredibly useful in its revival.

Discords
I very much dislike discord for it's walled gardened and non-indexable bs but that's where people live. If you want to do look back at a discords ancient history it's better to it with an offline copy than scroll and load and scroll and load or with discords shitty searching. Use this: https://github.com/Tyrrrz/DiscordChatExporter

STORING
Local Strategies:
Really you can't beat just storing on your hard-drive.
Data stored on an HDD lasts longer when being cold stored (when the disk is not plugged into anything for long periods of time) than flash storage (SSD's, SD cards, USB).

To store more data on your HDD of choice, compress your shit!
Now data that is already compressed (like .mp4's or .jpgs) won't be any smaller if you compress them again with winrar or whatever, but for websites and other uncompressed media (documents, png, avi) you can save a lot of space by compressing. Use WinRar or 7zip with the default settings to achieve a decent ratio without taking a ton of CPU time, then store that away.

Cloud Strategies:
So an archive isn't very good if it only exists on one place. Bit-rot, drive failure, water damage, accidental formatting, or any number of things can cause ruin to your data hoard.
Upload in multiple places!
If any of your archive is copyrighted material, archive it and add a password so hosts cant peep the data inside.

The big three:
MEGA.NZ - 15GB for free users, 5GB allowed to be upload per day
Google Drive - 15GB for free users
OneDrive - 5GB for free users
More listed here

Another strat is to upload data as a video for ∞ cloud storage:

View: https://www.youtube.com/watch?v=8I4fd_Sap-g

YouTube may eventually set-up detection for this kind of thing, but it will work across all video platforms (vidlii, bitchute, ok.ru, nicovideo.jp)
There are other projects that do the same thing available on github if the limitations of this one are annoying
(but you can split your archive into parts, upload the parts into a playlist, and then download the playlist back later, so the limitation is easily circumvented)

One thing to keep in mind WHENEVER using a cloud service is your data and account, at any time, can be purged. Do not rely on just one service.
A good strat is to have data stored on an HDD that's plugged into your main desktop, a large portable HDD that's occassionally plugged into your desktop, and on a cloud service.
Check your backups regularly. A bi-weekly (every other week) basis is plenty sufficient to ensure your data doesn't get surprise purged.

SHARING
Any data worth saving is worth sharing. If you are utilizing cloud services it's easy enough to link to where it's being stored, but if your keeping things local..
Bittorrent
All torrent clients support making a torrent. Just add a public tracker such as udp://tracker.openbittorrent.com:6969 , seed and share (will ofc require your machine to be on for people to download)

IPFS
Similar to Bittorrent you can share files over IPFS, this can be useful if the person your trying to share it to has P2P traffic blocked as IPFS supports public web gateways for hosted content. Its a neat protocol (also requires your machine to be on for others to download)

Webserver
Bittorrent is not as efficient as a direct Server to Client protocol, but most ISPs don't allow port forwarding on residential networks. However many VPN providers, like Proton, Mullvad, and others allow for port-forwarding. Forward a port through your VPN, download Apache, edit httpd.conf so "Listen 80" is "Listen $Forwarded_Port", drop the files you want to share in the htdocs folder, run httpd.exe and now you can share your files direct via an IPv4 link.

Further Reading
I'll stop there as this is supposed to be for the everyman and hosting a Webserver, even without getting into router port-forwarding and DNS n shit, is still technical. But it's plenty cheap to get a dedicated device (Raspberry Pi or shitty craigslist PC) to be a NAS, seedbox, and webserver.
https://old.reddit.com/r/DataHoarder/wiki/index
The DataHoarder sub is a great resource for information and frequently lead community efforts to archive en-masse (such as their Imgur archival effort)
This is a good resource for tools (ImageBoard thread archiving, instagram scraping, telegram ripping, etc.)



If anyone here considers themselves a data hoarder or internet archivist share your set-ups, tools, and what your targeted media is! If your interested in getting started ask any questions you have :D
 
Last edited:
Virtual Cafe Awards

RisingThumb

Imaginary manifestation of fun
Joined
Sep 9, 2021
Messages
715
Reaction score
1,758
Awards
173
Website
risingthumb.xyz
Good thread :SataniaThumbsUp:

I think something worthwhile noting is not everything is worthwhile saving. And not everything can be saved. And even what is saved, will not always provide value to people. Let me invoke the 99% of ancient literature that is lost to history, and how little it affects many people. Of course, within that is whole histories and esoteric knowledge, and even despite that most of it is still inaccessible as it is written in languages that aren't known... or knowable. Latin is difficult enough for most people(read Lingua Latina per se illustrata if you want to learn it), but you have texts like the Voynich manuscript that can't be understood.

Another point is the obsession with the notion of data hoarding. Consider the question of value to noise ratios. On most sites, the value to noise ratio steers heavily into noise, but for books and other works like that, it steers more heavily into value(of course it's only value if it's used by you).

For me personally, I'd like to hear where other people get Game Roms from. They're actually kinda worthwhile archiving because copyright makes it desirable to take them down, but also the cost of hosting and demand for them hammers the download speeds. I'm most interested in where I'd find collections for NES, SNES, GB, GBA games etc. The smaller ones where it's reasonable to have them all on a HDD.

Anyway some more links:
https://archive.org/
https://wiki.archiveteam.org/index.php/Main_Page
Jason Scott's youtube channel where he has podcasts discussing various stories around archival
 
Virtual Cafe Awards

Eden

Did You Get My Message?
Joined
Feb 26, 2023
Messages
341
Reaction score
1,061
Awards
120
Website
foreverliketh.is
I'm actually planning to write a blog post in this vein, but still not quite happy with it. Anyway, this is a conversation I've followed since I became interested in personal websites. I wrote something in the Yesterweb Forums on this topic, please allow me to share it here:

~
Preservation is a topic I also hold dear. It hurts to lose something one cares about. There's this page I came across a few months ago titled stability; it's very short, consider giving it a look. "Death is a part of life, not the end of it." I don't remember where I heard that, but please know I do not mean to invoke anything religious with it. Just perhaps that there's a certain... flavor? quality? magic? to life that exists because of loss.

If I feel like that about it, then why bother with preservation at all? It's so easy to go to extremes, isn't it? All or nothing... To preserve something, I think it's an act of love. I don't believe that something being preserved is dead, at least not entirely. I think there's a beauty in that, perhaps not the same beauty as when that thing was more alive, but a beauty nonetheless. Preservation gives a chance for that beauty to reach "just one more person".

A little weird, I guess, right? I see this a bit as a spectrum, not a binary one or the other. To me, preservation exists in the in-between, in the gray. It's important because it can be lost. The CAN is essential there though. To "outstay one's welcome", there's an ugliness to that, no? Life AT-ALL-COSTS, NO MATTER WHAT sounds a little scary to me. Death is not someone you leave at the front door when they come knocking. You let them in, offer some tea, maybe talk a bit. Much more graceful than having your place broken into in the middle of the night.
~

I'll second @RisingThumb with regard to hoarding. It can hold a darkness over a person. Sometimes the intention isn't so pure as preservation but as an exertion of control. There's more I want to say here, but maybe at later date.

I think it's important to be able to let go, yo.
 
Virtual Cafe Awards
Everyone is still huddling around the Internet Archive as some sort of magical solution and I cannot sufficiently emphasize how much they need to Not Do That.
 
Last edited:
Virtual Cafe Awards

h00

message is what matters
Gold
Joined
Apr 15, 2022
Messages
612
Reaction score
2,858
Awards
216
Website
h00.neocities.org
I'd like to hear where other people get Game Roms from. They're actually kinda worthwhile archiving because copyright makes it desirable to take them down
This is a rather comprehensive overview of ROM sites, forums, and trackers:

I think Piracy is actually a really strong force for preservation. We wouldn't be able to play games like GTA4 or Team Fortress 2 as they were without pirates. Other games like Poker Night at the Inventory, you wouldn't be able to play FULLSTOP without pirates. Old and indie movies are also preserved by pirates. There's so much media out there that you flat-out cannot purchase or otherwise legally view. Maybe you can get a VHS or DVD, but there's plenty of media that has only been distributed digitally that is no longer available for purchase.

hcz202ae8t1b1.jpg
 
Virtual Cafe Awards
I think it's important to be able to let go, yo.
Reading all that and seeing your mention (of it) and then link in footer (banner) wanna join in, of course with your approval, into my "webring"? (Oh the misery, need to learn how to make embed RSS or whatever!)
My link in profile, then navigate on top of it, link on my blogspot [or here].
 
Virtual Cafe Awards

Orlando Smooth

Well-Known Traveler
Joined
Aug 12, 2019
Messages
455
Reaction score
1,707
Awards
143
I really appreciate this post. I can't contribute too much other than to complain about how damn expensive HDDs have gotten. I have NAS in my home that I manage myself, used to be able to get 14 TB Western Digital red labels for like $250. Went to look for a couple more to increase my storage pool and apparently the price has crept all the way up to like $435 or so.

Anyways, if you're into hoarding stuff, do yourself a favor and build some simple NAS. It's not prohibitively expensive, it can be built out as a RAID array so that it's self-redundant and capable of surviving multi-drive failures, and you can access it from any device.
 
Virtual Cafe Awards

mydadiscar

Webcomics! Banzai!
Joined
Jan 20, 2022
Messages
1,592
Reaction score
5,891
Awards
268
Bump, i fucking needed this, especially the YouTube downloaders. I have a playlist of old videos and I heard youtube is gonna start purging inactive channels soon and I've been panicking. The playlist is much too big to even consider doing it manually.
Everyone is still huddling around the Internet Archive as some sort of magical solution and I cannot emphasize how much they need to Not Do That.
Right, it should be a motivator to archive things personally, rather than just a service for other people to rely on. It could go tits-up like any other site.
 
Virtual Cafe Awards
Reading all that and seeing your mention (of it) and then link in footer (banner) wanna join in, of course with your approval, into my "webring"? (Oh the misery, need to learn how to make embed RSS or whatever!)
My link in profile, then navigate on top of it, link on my blogspot [or here].
pog, you know

N-69) XXIIVV Webring #,​

@Eden <3​


Add.:
Your button in it its place on myyolo1999, blogspot, /p/ja.html!
 
Last edited:
Virtual Cafe Awards

Eden

Did You Get My Message?
Joined
Feb 26, 2023
Messages
341
Reaction score
1,061
Awards
120
Website
foreverliketh.is
I really appreciate this post. I can't contribute too much other than to complain about how damn expensive HDDs have gotten. I have NAS in my home that I manage myself, used to be able to get 14 TB Western Digital red labels for like $250. Went to look for a couple more to increase my storage pool and apparently the price has crept all the way up to like $435 or so.

Anyways, if you're into hoarding stuff, do yourself a favor and build some simple NAS. It's not prohibitively expensive, it can be built out as a RAID array so that it's self-redundant and capable of surviving multi-drive failures, and you can access it from any device.
Mentioning ShuckStop for anybody into this hobby.
Reading all that and seeing your mention (of it) and then link in footer (banner) wanna join in, of course with your approval, into my "webring"? (Oh the misery, need to learn how to make embed RSS or whatever!)
My link in profile, then navigate on top of it, link on my blogspot [or here].
Thank you! I have added you to my button wall as well :) Let me know if I did it wrong

N-69) XXIIVV Webring #,​

That webring is quite the artsy one. Also, it don't take kindly to Javascript. But it's incredibly well maintained and goes above and beyond with it's features. But they'll never take me :agcrybl:
 
Virtual Cafe Awards
thanks! it is one option, but i would rather use the
rect11417.jpg
one for that XD. (just changing src, less than min - thanks nontheless!)
planning to make gif one, yet it takes time and inspiration XD.

also, seeing your place, i wonder if @I-330 or @Yabba would be all ok with adding even his button on my own place (not self-hosted tho...)

Btw, (i forgot) i think i used the (more) "wrong" button - instead of
foreverliketh.is.png
, using the
johnvertisement-20221227.jpg
one XDDD.
Mentioning ShuckStop for anybody into this hobby.

Thank you! I have added you to my button wall as well :) Let me know if I did it wrong


That webring is quite the artsy one. Also, it don't take kindly to Javascript. But it's incredibly well maintained and goes above and beyond with it's features. But they'll never take me :agcrybl:
 
Virtual Cafe Awards

I-330

certified loon
Gold
Joined
Dec 20, 2022
Messages
122
Reaction score
697
Awards
77
Website
i330.dev
Virtual Cafe Awards

pronoundisrespecter

Raw Honey Defender
Joined
May 15, 2023
Messages
44
Reaction score
436
Awards
35
Yess I'm so glad you mentioned yt-dlp. I didn't even know that fork of yt-dl existed, but I will be excited to download my favourite video creators' entire libraries and keep multiple backups. A lot of them haven't uploaded on YouTube in some cases over a decade, and the whole thing about YouTube deleting inactive accounts was making me nervous about losing all that good information that are in those videos.
 
Virtual Cafe Awards

Andy Kaufman

i know
Joined
Feb 19, 2022
Messages
1,185
Reaction score
4,795
Awards
209
I love yt-dlp! I just installed it on my Alpine Linux on my Windows! (Alpine is a very, very lightweight Linux distro I recommend everyone who just wants to play around with linux on the side on your windows PC. It's easily accesible in the official Windows store. You will need to install a lot of the packages you need yourself but at least then you know what's on there!)
I tested it out to download all the audio of a channel I like. This is the command I used

yt-dlp --extract-audio --audio-format mp3 -o "/mnt/d/media/vids/%(title)s %(upload_date)s" https://www.youtube.com/XXXX

--extract-audio means that you only want the audio
--audio-format lets you specify a format, in this case "mp3"
-o lets you decide a location the audio should be saved to and also a pattern on how the files should be named. In this case they will get the video title and the upload date as a filename

finally you can just paste a YT-channel URL!

It's still downloading and converting but I'm done 50%
 
Virtual Cafe Awards

LostintheCycle

Formerly His Holelineß
Joined
Apr 4, 2022
Messages
1,003
Reaction score
3,982
Awards
248
If you ever need help using the CLI version just ask god
I prefer to ask a man
finally you can just paste a YT-channel URL!
Damn I wish I had a decent SSD and more bandwidth, otherwise I'd have a setup to automatically archive channels I like. Particularly want to archive all of Louis Rossman's stuff, which would probably be a metric ton of data, lots of valuable videos.
 
Virtual Cafe Awards

h00

message is what matters
Gold
Joined
Apr 15, 2022
Messages
612
Reaction score
2,858
Awards
216
Website
h00.neocities.org
Does anyone have recommendations for backing up a copy of my NAS? I was looking into wasabi but it's expensive at the scale I'd be using it. I was also considering amazon glacier.
How much data are you trying to back up?
Damn I wish I had a decent SSD and more bandwidth, otherwise I'd have a setup to automatically archive channels I like. Particularly want to archive all of Louis Rossman's stuff, which would probably be a metric ton of data, lots of valuable videos.
Your write speeds will usually be faster than your bandwidth, SSD not necessary. If you are able to have a host of yours stay on 24/7 it could easily archive channels even with DSL speeds.
 
Virtual Cafe Awards