No description
Find a file
Taavi Väänänen 00a37d0b48 import/mediawiki: use namespace IDs for filtering
Updates the mediawiki importer to use the namespace IDs for filtering
instead of matching for the beginning of the article title. This better
supports other language versions and non-Wikipedia wikis.

Signed-off-by: Taavi Väänänen <hi@taavi.wtf>
2022-07-13 10:14:30 +02:00
cmd sh-index: add -u flag to add URLs to schedule 2022-07-11 20:57:59 +02:00
config Fix searchut typo in the config file path 2022-07-11 13:17:16 +02:00
crawler crawler: fix log message 2022-07-11 21:31:20 +02:00
database database: add middleware 2022-07-09 13:52:55 +02:00
graph API: add index size to stats 2022-07-11 21:38:29 +02:00
import import/mediawiki: use namespace IDs for filtering 2022-07-13 10:14:30 +02:00
query web: add total pages indexed to home page 2022-07-11 20:40:53 +02:00
static Truncate page titles and URLs in search results 2022-07-11 17:48:50 +02:00
templates web: add total pages indexed to home page 2022-07-11 20:40:53 +02:00
.gitignore Add Makefile 2022-07-09 18:14:00 +02:00
config.example.ini sh-api: expand top-level server riggings 2022-07-09 15:39:04 +02:00
COPYING Initial commit 2022-07-08 19:46:11 +02:00
go.mod web: add total pages indexed to home page 2022-07-11 20:40:53 +02:00
go.sum web: add total pages indexed to home page 2022-07-11 20:40:53 +02:00
gqlgen.yml API: Implement search resolver 2022-07-09 15:48:03 +02:00
Makefile Add Makefile 2022-07-09 18:14:00 +02:00
README.md Add README.md 2022-07-08 20:55:55 +02:00
schema.sql schema: use rum index 2022-07-13 10:13:54 +02:00

WIP

Why is this crawling my site?

This crawler is still under development. It respects robots.txt Disallow and Crawl-Delay directives. But, if it's annoying you, email sir@cmpwn.com and I'll knock it off.