Taavi Väänänen
00a37d0b48
import/mediawiki: use namespace IDs for filtering
...
Updates the mediawiki importer to use the namespace IDs for filtering
instead of matching for the beginning of the article title. This better
supports other language versions and non-Wikipedia wikis.
Signed-off-by: Taavi Väänänen <hi@taavi.wtf>
2022-07-13 10:14:30 +02:00
Drew DeVault
13d5f95eab
import/mediawiki: drop File: pages
2022-07-11 20:22:35 +02:00
Drew DeVault
74b26cecfa
import/mediawiki: more improvements
2022-07-11 19:30:57 +02:00
Haelwenn (lanodan) Monnier
5689b79e13
import/cve.org: truncate content for excerpt
2022-07-11 19:11:37 +02:00
Haelwenn (lanodan) Monnier
062e63437a
import/cve.org: New importer
2022-07-11 17:53:58 +02:00
Umar Getagazov
fde8b75efd
Drop crawl schedule-related fields
...
They were unused.
2022-07-11 17:50:44 +02:00
Drew DeVault
5848adfea0
mediawiki: don't parse until we know we want it
2022-07-11 14:35:22 +02:00
Drew DeVault
4567044626
import/mediawiki: delete elements when done
...
To avoid blowing up memory usage
2022-07-11 14:27:21 +02:00
Drew DeVault
c8762965ac
import/mediawiki: initial commit
2022-07-10 11:11:18 +02:00