You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
mrq 064eec8de5 Update 'README.md' 2 months ago
sql Add default value to timestamp_expired 4 years ago
tests Update json model 4 years ago
Dockerfile Add dockerfile 4 years ago
README.md Update 'README.md' 2 months ago
board.js My ACTUAL Amazing Changes :) 2 months ago
boardwatcher.js Additional error handling 4 years ago
config.js My Amazing Changes :) 2 months ago
database.js My Amazing Changes :) 2 months ago
local.js My Amazing Changes :) 2 months ago
network.js Implement address rotation 4 years ago
objectboard.js object based storage class 4 years ago
package.json My Amazing Changes :) 2 months ago
post.js Drop unneeded post data from the heap after inserts 4 years ago
queue.js Queue and stat tracking module 4 years ago
server.js My ACTUAL Amazing Changes :) 2 months ago
thread.js Improve thread class 4 years ago
uam.js My Amazing Changes :) 2 months ago

README.md

4chan JSON Archiver

This is another 4chan json archiver, written in nodeJS this time. This archiver requires considerably less resources than the previous java-based archiver. Give it a database connection and a folder to save to, and it should fully archive designated boards. This archiver is designed to be a drop-in replacement for the Asagi archiver, and maintains the same SQL schema.

Installation

This service should be compatible with most versions of node.

$ git clone <repository>
$ cd archiver
$ npm install
$ npm start (or node server.js)

Configuration

The configuration file is named config.js and located in the root of the project.

Modifications

This """fork""" was the result of the Java-based Asagi scraper failing a couple of years ago, and was shamelessly yoinked to fit my needs, two core features were added

  • CloudFlare bypassing actually worked in a kludge solution
  • Sniffs for PNGEmbed data

The former apparently isn't needed anymore, but the latter is another one of my ingenious proof-of-concepts; since the scraper is already combing through a bunch of files, why not add in some (rather loose) file sniffing, and supply a relatively simple API?

PEE API

Calling http://localhost:port/?hash=[md5 hash|base64 encoded hash] returns an array of embedded URLs in the image. If there's no embed data sniffed, then false is returned.

Potential Issues

I genuinely don't care about TypeScript, much less JavaScript, much less """""proper""""" node.js, so my PEE heuristics are flawed, but it works in my few tests. Var names are clear that I made an attempt to read the source. :&)

I only tested against PNGs with various amounts of embedded URLs, so whatever meme features that aren't those are untested. Additional support is left as an exercise to the reader.

Example Server

To be spun up™