Haelwenn (lanodan) Monnier
062e63437a
import/cve.org: New importer
2022-07-11 17:53:58 +02:00
Umar Getagazov
fde8b75efd
Drop crawl schedule-related fields
...
They were unused.
2022-07-11 17:50:44 +02:00
Umar Getagazov
a7e6fba60f
Rank authoritative websites and index pages higher
...
Implements: https://todo.sr.ht/~sircmpwn/searchhut/23
2022-07-11 17:49:19 +02:00
Umar Getagazov
72649f0f0e
Truncate page titles and URLs in search results
...
Implements: https://todo.sr.ht/~sircmpwn/searchhut/25
2022-07-11 17:48:50 +02:00
Umar Getagazov
2971603710
Put domain labels minus eTLD into the text index
...
Before, only the hostname (say, harelang.org) was indexed, and no
results appeared for a "harelang" query. Now, all domain labels (minus
the eTLD) are indexed separately (for example, "docs" and "harelang" for
"docs.harelang.org"), and such queries work. eTLD is removed using the
data from Mozilla's Public Suffix List (https://publicsuffix.org ).
2022-07-11 17:48:46 +02:00
Drew DeVault
c6777e21a7
schema.sql: set default exclusion list to {}
2022-07-11 17:48:36 +02:00
Drew DeVault
5848adfea0
mediawiki: don't parse until we know we want it
2022-07-11 14:35:22 +02:00
Drew DeVault
4567044626
import/mediawiki: delete elements when done
...
To avoid blowing up memory usage
2022-07-11 14:27:21 +02:00
Umar Getagazov
5471687556
Add per-domain page exclusion mechanism
2022-07-11 13:20:31 +02:00
Umar Getagazov
ef32533b75
Fix searchut typo in the config file path
2022-07-11 13:17:16 +02:00
Umar Getagazov
3b056cc0b4
Dark theme
...
Colors taken from the dark theme of SourceHut services; some of them
tweaked for contrast.
Implements: https://todo.sr.ht/~sircmpwn/searchhut/24
2022-07-11 13:17:02 +02:00
Drew DeVault
50fd2562f5
Highlight result title in bold
2022-07-11 13:16:47 +02:00
Umar Getagazov
dda780c694
UI fixups for f449fe8
...
Mostly returning the look to the previous state, code formatting, and
adjusting the look of the search results label.
2022-07-11 13:13:09 +02:00
Umar Getagazov
67c60ef5c1
Use the real crawler UA at /about
2022-07-11 13:13:05 +02:00
Umar Getagazov
3bc5cd9689
Responsive UI
...
Implements: https://todo.sr.ht/~sircmpwn/searchhut/20
2022-07-11 13:08:37 +02:00
Rohan Kumar
f449fe8a32
Semantic/a11y markup improvements
...
- Make search results an <ol> with an ARIA label. If more elements are
erver present on the SERP (e.g. settings), the <ol> should be placed
inside a <section> and its label should move to that section too.
- Remove list-style and padding from the <ol> in the stylesheet
- Add the "search" ARIA role to the search form.
- Make search result titles headings. This is established convention
that assistive-technology users are already familiar with from other
engines.
- Add an indicator for "N search results found". This is where the list
label comes from.
- Exclude the brand name from machine translation.
2022-07-10 15:03:04 +02:00
Drew DeVault
76bc26d639
Adding missing /about bits
2022-07-10 15:02:55 +02:00
Umar Getagazov
7a67438e9c
Add favicon
2022-07-10 15:02:28 +02:00
Drew DeVault
c367bbddd3
Add about page
2022-07-10 13:07:00 +02:00
Drew DeVault
c8762965ac
import/mediawiki: initial commit
2022-07-10 11:11:18 +02:00
Drew DeVault
e44770b9b7
schema: add "source" column to page
2022-07-10 10:13:11 +02:00
Drew DeVault
d30cdbf52e
crawler: fix interval input
2022-07-10 09:55:30 +02:00
Drew DeVault
01b2b1349b
crawler: compute checksum and make unique
...
Fixes: https://todo.sr.ht/~sircmpwn/searchhut/30
2022-07-10 09:36:07 +02:00
Drew DeVault
9790813a55
Track pages with JavaScript and total crawl time
2022-07-10 09:12:07 +02:00
Drew DeVault
e15dffd86b
Handle Retry-After as timestamp
2022-07-09 19:16:48 +02:00
Drew DeVault
c15f968a28
crawler: re-schedule after HTTP 429
...
Fixes: https://todo.sr.ht/~sircmpwn/searchhut/5
2022-07-09 19:14:55 +02:00
Drew DeVault
6978b602f4
Handle canonical URLs
...
Fixes: https://todo.sr.ht/~sircmpwn/searchhut/11
2022-07-09 19:06:28 +02:00
Drew DeVault
baf82f9bb8
crawler: perform HEAD before GET
...
Implements: https://todo.sr.ht/~sircmpwn/searchhut/8
2022-07-09 18:59:23 +02:00
Drew DeVault
759ad758af
crawler: improve index settings
2022-07-09 18:57:39 +02:00
Drew DeVault
35a4faa05b
sh-index: fetch user agent from config
2022-07-09 18:14:06 +02:00
Drew DeVault
2ec534d63a
Add Makefile
2022-07-09 18:14:00 +02:00
Drew DeVault
3535309004
web: add link to index from search page
2022-07-09 18:07:46 +02:00
Drew DeVault
b41abd9376
main.css: change URL color in results
2022-07-09 17:51:05 +02:00
Drew DeVault
7140d0e2e5
web: add search results page
2022-07-09 17:48:52 +02:00
Drew DeVault
6e5deed8f4
web: add .index to html tag
2022-07-09 17:14:00 +02:00
Drew DeVault
738a9430cb
web: autofocus search box
2022-07-09 17:12:23 +02:00
Drew DeVault
ad9dd2701e
web: move infolinks to bottom of page
2022-07-09 17:02:58 +02:00
Drew DeVault
a1f6b8c8de
sh-web: initial commit
2022-07-09 16:56:25 +02:00
Drew DeVault
8cf92fa220
API: Implement search resolver
2022-07-09 15:48:03 +02:00
Drew DeVault
c1f917efb4
sh-api: expand top-level server riggings
2022-07-09 15:39:04 +02:00
Drew DeVault
0d32cf49d7
Implement configuration loader
...
Implements: https://todo.sr.ht/~sircmpwn/searchhut/18
2022-07-09 15:31:16 +02:00
Drew DeVault
09f762ca53
Add config.example.ini
...
References: https://todo.sr.ht/~sircmpwn/searchhut/18
2022-07-09 13:53:02 +02:00
Drew DeVault
b5656c9a1e
database: add middleware
2022-07-09 13:52:55 +02:00
Drew DeVault
208f766963
Initial GraphQL API riggings
2022-07-09 13:25:27 +02:00
Drew DeVault
a8069bb73b
Increase default delay to 5 seconds
2022-07-08 20:56:00 +02:00
Drew DeVault
92ca0ecf22
Add README.md
2022-07-08 20:55:55 +02:00
Drew DeVault
d6bc032d24
crawler: respect robots.txt
2022-07-08 20:30:09 +02:00
Drew DeVault
eb6769c904
crawler: follow links regardless of readability
2022-07-08 20:13:32 +02:00
Drew DeVault
fbd0492ef1
cmd/sh-search: initial commit
2022-07-08 20:04:37 +02:00
Drew DeVault
050694c4f2
Initial commit
2022-07-08 19:46:11 +02:00