Build yourself a Xapian index of package info
Run the Debian indexer on your distro
The Debian Xapian indexer is called update-apt-xapian-index and normally it reads data from the Apt database. Luckily it also has an option (--pkgfile=file) for reading data from a plain file, which is used to build server-side indices and to build a test environment for its test suite. If you can generate a suitable input file, update-apt-xapian-index will build an index for you.
The input file has the same format as the Debian Packages file, which is similar to email or HTTP headers:
Package: 2vcard Priority: optional Section: utils Installed-Size: 108 Maintainer: Martin Albisetti <email@example.com> Architecture: all Version: 0.5-3 Filename: pool/main/2/2vcard/2vcard_0.5-3_all.deb Size: 14300 MD5sum: d831fd82a8605e9258b2314a7d703abe SHA1: e903a05f168a825ff84c87326898a182635f8175 SHA256: 2be9a86f0ec99b1299880c6bf0f4da8257c74a61341c14c103b70c9ec04b10ec Description: perl script to convert an addressbook to VCARD file format 2vcard is a little perl script that you can use to convert the popular vcard file format. Currently 2vcard can only convert addressbooks and alias files from the following formats: abook,eudora,juno,ldif,mutt, mh and pine. . The VCARD format is used by gnomecard, for example, which is used by the balsa email client. Tag: implemented-in::perl, role::program, use::converting Package: 3dchess [...]
Records are separated with an empty line, and long fields like 'Description' use continuation lines that start with spaces. The first line of the description is the short description, the rest is the long description; an empty line in the Description is represented with a dot.
For update-apt-xapian-index you only need the fields Package, Version, Description, Tag, Section, Installed-Size and Size. Tag, Section, Installed-Size and Size are all optional, although you probably want Tag for Debtags categories.
If you want to start playing with the indexer without building your own input file, you can run
apt-cache dumpavail on any Debian or Ubuntu system to extract the whole system dataset. Alternatively, you can use any Packages file from a Debian mirror.
- python-xapian (Python bindings for Xapian)
- python-debian (used to read some Debian-style files, source is straightforward to build)
- python-chardet, dependency of python-debian, available in Fedora, Mandriva/Mageia and Suse with the same name Building the index:
git clone git://git.debian.org/git/collab-maint/apt-xapian-index.git cd apt-xapian-index # Testrun is just a simple wrapper that exports the variables needed # to run the indexer in the current directory ./testrun --pkgfile=inputfile --force --verbose # Creates an index in testdb/ # Try querying it with Xapian's low-level "delve" tool, to see if it worked: delve -1 -d -t edit testdb/index
The Xapian index itself is in testdb/index; testdb/ will contain other information about the index, including an autogenerated README file documenting its contents, especially the term prefixes used by the index.
Congratulations: you can now try querying the index. The Xapian website has documentation and examples for C++ and Python, Perl, PHP, Ruby, C#, Java and more bindings.
Patches welcome for alternative input file formats and extra plugins to index extra info you may need. Please update this page with your experience if you try it.
Possible things to try:
- Change DEBTAGSDB in plugins/debtags.py to make it read Debtags information from one of the distromatch exports so you don't need to add them as Tag: fields
- Get pkgshelf to work (it should only need
export AXI_DB_PATH=testdband editing
/var/lib/debtags/package-tagswith the location of your distromatch Debtags export.
- If you need to index some extra information, take a look at plugins/template.py for a plugin template: you only need to redefine the method indexDeb822.
- Build an indexer that reads the native package database for your own distribution, then get in touch with Enrico to see if it can all fit in the same codebase.