sources/searchhut - Forgejo: Beyond coding. We Forge.

sources/searchhut

No description

Find a file

Umar Getagazov 2971603710 Put domain labels minus eTLD into the text index Before, only the hostname (say, harelang.org) was indexed, and no results appeared for a "harelang" query. Now, all domain labels (minus the eTLD) are indexed separately (for example, "docs" and "harelang" for "docs.harelang.org"), and such queries work. eTLD is removed using the data from Mozilla's Public Suffix List (https://publicsuffix.org).		2022-07-11 17:48:46 +02:00
cmd	Use the real crawler UA at /about	2022-07-11 13:13:05 +02:00
config	Fix searchut typo in the config file path	2022-07-11 13:17:16 +02:00
crawler	Put domain labels minus eTLD into the text index	2022-07-11 17:48:46 +02:00
database	database: add middleware	2022-07-09 13:52:55 +02:00
graph	API: Implement search resolver	2022-07-09 15:48:03 +02:00
import/mediawiki	mediawiki: don't parse until we know we want it	2022-07-11 14:35:22 +02:00
query	web: add search results page	2022-07-09 17:48:52 +02:00
static	Dark theme	2022-07-11 13:17:02 +02:00
templates	Highlight result title in bold	2022-07-11 13:16:47 +02:00
.gitignore	Add Makefile	2022-07-09 18:14:00 +02:00
config.example.ini	sh-api: expand top-level server riggings	2022-07-09 15:39:04 +02:00
COPYING	Initial commit	2022-07-08 19:46:11 +02:00
go.mod	web: add search results page	2022-07-09 17:48:52 +02:00
go.sum	web: add search results page	2022-07-09 17:48:52 +02:00
gqlgen.yml	API: Implement search resolver	2022-07-09 15:48:03 +02:00
Makefile	Add Makefile	2022-07-09 18:14:00 +02:00
README.md	Add README.md	2022-07-08 20:55:55 +02:00
schema.sql	schema.sql: set default exclusion list to {}	2022-07-11 17:48:36 +02:00

README.md

WIP

Why is this crawling my site?

This crawler is still under development. It respects robots.txt Disallow and Crawl-Delay directives. But, if it's annoying you, email sir@cmpwn.com and I'll knock it off.