73 lines
3.2 KiB
HTML
73 lines
3.2 KiB
HTML
{{template "head.html" .}}
|
|
<header>
|
|
<h1>
|
|
<a href="/">
|
|
<span class="icon">
|
|
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path d="M256 8C119 8 8 119 8 256s111 248 248 248 248-111 248-248S393 8 256 8zm0 448c-110.5 0-200-89.5-200-200S145.5 56 256 56s200 89.5 200 200-89.5 200-200 200z"></path></svg>
|
|
</span>
|
|
<span>searchhut</span>
|
|
</a>
|
|
</h1>
|
|
</header>
|
|
<main>
|
|
<h2>About searchhut</h2>
|
|
<p>
|
|
SearchHut is a curated
|
|
<abbr title="'Free' as in freedom, in that we provide public access to our software source code.">free software</abbr>
|
|
search engine developed and operated by
|
|
<a href="https://sourcehut.org">SourceHut</a>.
|
|
|
|
<h3>About the search engine</h3>
|
|
<p>
|
|
The search engine itself is pretty basic at the moment. In the future, it
|
|
will be expanded to support narrowing down your search terms by applicable
|
|
tags (e.g. #docs #python), filtering for sites with and without JavaScript,
|
|
searching specific sites (e.g. @wikipedia.org), and other features. The
|
|
service does not (and will never) have advertising and is directly
|
|
subsidized by SourceHut.
|
|
|
|
<h3>About the index</h3>
|
|
<p>
|
|
SearchHut indexes from a <a href="/about/domains">curated set of domains</a>.
|
|
The quality of results is higher as a result, but the index covers a small
|
|
subset of the web. The index prioritizes authoritative, high-quality, and
|
|
informative sources. Any websites engaging in SEO spam are rejected from
|
|
the index. This instance is maintained by free software developers and
|
|
biases towards indexing websites that serve their needs and interests. If
|
|
you would like a website added to the index, fill out the
|
|
<a href="/request">indexing request form</a>.
|
|
|
|
<h3>About the crawler</h3>
|
|
<p>
|
|
The SearchHut crawler is very simple. It crawls websites by queuing
|
|
first-party links only, and stores data in a simple Postgres
|
|
full-text-search index. The crawler respects robots.txt Allow, Deny, and
|
|
Crawl-Delay directives. For the full details on how the crawler works, and
|
|
for information for web admins of indexed sites, see
|
|
<a href="/docs/crawler.html">the documentation</a>.
|
|
|
|
<p>
|
|
The crawler's User-Agent is:
|
|
|
|
<pre>SearchHut Bot 0.0 (GNU AGPL 3.0); https://sr.ht/~sircmpwn/searchhut <sir@cmpwn.com></pre>
|
|
|
|
<h3>About the API</h3>
|
|
<p>
|
|
The search engine provides a public GraphQL API for anonymous use, allowing
|
|
users to conduct web searches programmatically. For information about the
|
|
API, see
|
|
<a href="/docs/api.html">the documentation</a>.
|
|
|
|
<h3>About the software</h3>
|
|
<p>
|
|
SearchHut is an AGPL 3.0-licensed free software project hosted
|
|
<a href="https://sr.ht/~sircmpwn/searchhut">on SourceHut</a>, which provides
|
|
git repositories, a bug tracker, and mailing lists for development &
|
|
discussion. Patches are welcome, and users are encouraged to set up their
|
|
own search engines crawling whatever subset of the web they like. It could
|
|
be easily repurposed to create an academic-focused search engine, for
|
|
instance. For information about deploying your own instance, see
|
|
<a href="/docs/deploy.html">the documentation</a>.
|
|
|
|
</main>
|
|
{{template "footer.html" .}}
|