this post was submitted on 07 Oct 2025
20 points (100.0% liked)

General Discussion

425 readers
50 users here now

Discuss anything GunCAD related. If you need help, see /c/help. If you think you're discussing a thing a lot of the time, consider making your own separate community.

Abide by the global rules or be smitten.

Shitty stock image may be changed at moderator discretion.

founded 2 months ago
MODERATORS
 

Apologies if crossposting is against the rules; I'm not entirely sure where the lines are drawn here yet.

I posted this in lemmy.fosscad (fosscad@lemmy?), but realize that may not be the most active venue.

I grabbed archives of fosscad and took a look at the contents of the zst's. I think I could probably rebuild the contents of the subreddit in some manner or another; the question is scale and hosting. How would we make the posts easily searchable, where would they live, what endpoint can we upload hundreds of thousands of comments into in a reasonable time frame... all that fun stuff.

The archives don't contain pictures, but contain links to the pictures and the ones I've checked are currently still live (meaning the pics are still hosted on reddit). Dunno how long that will remain the case.

I have no idea what the size of the archives would be with pics downloaded; gigs, a TB, no clue. I'm posting this to gauge public interest and I haven't done much preliminary work (oh, these are json. Yep, dictionaries work. Wingo.)

Is there any interest in making this more publicly available? I've run into an issue with a particular build and I'll be diving through the archives to fix it for my self. It seems like a shame that all this information would be inaccessible to everyone who isn't able or interested in trawling through their own local archives.

I'm not a programmer by trade, but work in an adjacent space. I can plink along on this if other people are interested (and if anyone is interested enough to help pitch in, even better).

you are viewing a single comment's thread
view the rest of the comments
[–] gsgmfg@fosscad.io 3 points 1 month ago* (last edited 1 month ago) (2 children)

I'm just starting on this.

I started with all i.reddit.com links. There are roughly 6k urls. I wrote a script to bulk download them but save the name as a hash of the url + ext.

Got 4.5k already - a boatload of 404s.

How old is this archive? There could be a lot of deleted posts that we retained over the years.

-----
Total URLs seen: 6317
OK:              4531
Missing:         1786
Zero-bytes:      0
Not image:       0

6317 urls in the input. We got 71%


Imgur and others next :)

EDIT:

We got 100% of the Imgur links

-----
Total URLs seen: 629
OK:              629
Missing:         0
Zero-bytes:      0
Not image:       0

Going for the rest of the image links from all over :)

Going to spin up !ark@fosscad.io - and create a bot to reupload a lot of the posts.

EDIT 2:

!ark@fosscad.io now exists.

I finished off the post images (other domains than i.reddit.com and imgur)

Total URLs seen: 11628
OK:              9574
Missing:         2051
Zero-bytes:      0
Not image:       3

82.3% Pretty good.

Gonna finish up with comments and then work on a script to reupload under a bot account.

[–] TheShittinator 2 points 1 month ago

This is fucking sick. Once we have the images, we have the full archive, and it'll just be a matter of importing the data.

[–] gsgmfg@fosscad.io 2 points 1 month ago
Total URLs seen: 1103
OK:              970
Missing:         105
Zero-bytes:      0
Not image:       28

88% on the rest of the comment images. Now to wire up a bot.