Commit graph

9 commits

Author SHA1 Message Date
Taavi Väänänen
00a37d0b48 import/mediawiki: use namespace IDs for filtering
Updates the mediawiki importer to use the namespace IDs for filtering
instead of matching for the beginning of the article title. This better
supports other language versions and non-Wikipedia wikis.

Signed-off-by: Taavi Väänänen <hi@taavi.wtf>
2022-07-13 10:14:30 +02:00
Drew DeVault
13d5f95eab import/mediawiki: drop File: pages 2022-07-11 20:22:35 +02:00
Drew DeVault
74b26cecfa import/mediawiki: more improvements 2022-07-11 19:30:57 +02:00
Haelwenn (lanodan) Monnier
5689b79e13 import/cve.org: truncate content for excerpt 2022-07-11 19:11:37 +02:00
Haelwenn (lanodan) Monnier
062e63437a import/cve.org: New importer 2022-07-11 17:53:58 +02:00
Umar Getagazov
fde8b75efd Drop crawl schedule-related fields
They were unused.
2022-07-11 17:50:44 +02:00
Drew DeVault
5848adfea0 mediawiki: don't parse until we know we want it 2022-07-11 14:35:22 +02:00
Drew DeVault
4567044626 import/mediawiki: delete elements when done
To avoid blowing up memory usage
2022-07-11 14:27:21 +02:00
Drew DeVault
c8762965ac import/mediawiki: initial commit 2022-07-10 11:11:18 +02:00