descriptionCollation algorithms for Python
ownerJoe Wreschnig
last changeMon, 24 Aug 2015 20:31:49 +0000 (22:31 +0200)
readme

Collation algorithms for Python


pycollate is an interface to various collation algorithms for Python.

Supported backends: - icu - Based on the IBM ICU toolkit and Jim Fulton's zope.ucol. - syslocale - Native OS collation routines. - codepoint - Raw Unicode codepoint comparison

If available, you'll probably want to use the ICU backend. If it's not available, syslocale should work on most Python installations. A specific backend can be used, or a "best" backend is chosen by default.

pycollate also provides tools to perform word-wise and numeric sorts.

pycollate, as with all Unicode collation tools, is a work in progress.

Installing

$ sudo apt-get install python-pyrex libicu-dev
$ ./setup.py build
$ sudo ./setup.py install

Example

import collate
strings = open("contents.txt").read().decode("utf-8").splitlines()
strings.sort(key=collate.key)

FAQ

What's collation?

Collation is the process of sorting information in a useful way. In particular, this module sorts strings in a way that humans might expect to read them.

What's so hard about that?

Nothing, if your strings are all in one language and you speak English yourself.

On the other hand, if that's not the case you need to make sure "ss" and "ß" sort similarly, "å" sorts like "A" (unless you're Swedish), and "21 Monkeys" comes after "3 Monkeys".

How fast is the library?

Slow enough that you will probably want to cache sort keys. On a mid-range system at the time of its writing, it takes about half a second to sort 10000 song titles.

License

icu/_icu.pyx

Copyright (c) 2004 Zope Corporation and Contributors. All Rights Reserved.

This software is subject to the provisions of the Zope Public License, Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.

All else

Copyright 2010 Joe Wreschnig

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

shortlog
2015-08-24 Joe WreschnigUpdates for new hosting. master
2010-03-02 Joe WreschnigAdded tag 0.2 Release for changeset 0a75c795453a
2010-03-02 Joe Wreschnig0.2 release.
2010-03-02 Joe WreschnigMANIFEST.in: Fix typo, include test py files.
2010-03-02 Joe WreschnigBetter README.
2010-02-26 Joe WreschnigCollator.lstripwords: Strip words off the start and...
2010-02-26 Joe WreschnigFrench reverse accent sort test.
2010-02-25 Joe Wreschnigstrings: Microoptimizations, saves about 10% of runtime.
2010-02-25 Joe Wreschnigstrings: Include deroman in import list.
2010-02-25 Joe WreschnigRoman numeral parsing. More test cases. (Fixes issue #3)
2010-02-24 Joe Wreschnigstrings.sortemes: Use a line break to separate letters...
2010-02-23 Joe WreschnigMore release preparation. Docstrings and consistency...
2010-02-22 Joe WreschnigInvalidLocaleError is more a LookupError than a ValueError.
2010-02-22 Joe WreschnigCleanup in preparation for release. Add docstrings...
2010-02-22 Joe WreschnigFix some pychecker errors.
2010-02-22 Joe WreschnigFix typo, remove unneeded check.
...
heads
4 years ago master