sources/searchhut - Forgejo: Beyond coding. We Forge.

sources/searchhut

No description

Find a file

Taavi Väänänen 00a37d0b48 import/mediawiki: use namespace IDs for filtering Updates the mediawiki importer to use the namespace IDs for filtering instead of matching for the beginning of the article title. This better supports other language versions and non-Wikipedia wikis. Signed-off-by: Taavi Väänänen <hi@taavi.wtf>		2022-07-13 10:14:30 +02:00
cmd	sh-index: add -u flag to add URLs to schedule	2022-07-11 20:57:59 +02:00
config	Fix searchut typo in the config file path	2022-07-11 13:17:16 +02:00
crawler	crawler: fix log message	2022-07-11 21:31:20 +02:00
database	database: add middleware	2022-07-09 13:52:55 +02:00
graph	API: add index size to stats	2022-07-11 21:38:29 +02:00
import	import/mediawiki: use namespace IDs for filtering	2022-07-13 10:14:30 +02:00
query	web: add total pages indexed to home page	2022-07-11 20:40:53 +02:00
static	Truncate page titles and URLs in search results	2022-07-11 17:48:50 +02:00
templates	web: add total pages indexed to home page	2022-07-11 20:40:53 +02:00
.gitignore	Add Makefile	2022-07-09 18:14:00 +02:00
config.example.ini	sh-api: expand top-level server riggings	2022-07-09 15:39:04 +02:00
COPYING	Initial commit	2022-07-08 19:46:11 +02:00
go.mod	web: add total pages indexed to home page	2022-07-11 20:40:53 +02:00
go.sum	web: add total pages indexed to home page	2022-07-11 20:40:53 +02:00
gqlgen.yml	API: Implement search resolver	2022-07-09 15:48:03 +02:00
Makefile	Add Makefile	2022-07-09 18:14:00 +02:00
README.md	Add README.md	2022-07-08 20:55:55 +02:00
schema.sql	schema: use rum index	2022-07-13 10:13:54 +02:00

README.md

WIP

Why is this crawling my site?

This crawler is still under development. It respects robots.txt Disallow and Crawl-Delay directives. But, if it's annoying you, email sir@cmpwn.com and I'll knock it off.