description | Collation algorithms for Python |
owner | Joe Wreschnig |
last change | Mon, 24 Aug 2015 20:31:49 +0000 (22:31 +0200) |
URL | https://git.korewanetadesu.com/python-collate.git |
pycollate is an interface to various collation algorithms for Python.
Supported backends:
- icu
- Based on the IBM ICU toolkit and Jim Fulton's zope.ucol.
- syslocale
- Native OS collation routines.
- codepoint
- Raw Unicode codepoint comparison
If available, you'll probably want to use the ICU backend. If it's not available, syslocale should work on most Python installations. A specific backend can be used, or a "best" backend is chosen by default.
pycollate also provides tools to perform word-wise and numeric sorts.
pycollate, as with all Unicode collation tools, is a work in progress.
$ sudo apt-get install python-pyrex libicu-dev
$ ./setup.py build
$ sudo ./setup.py install
import collate
strings = open("contents.txt").read().decode("utf-8").splitlines()
strings.sort(key=collate.key)
Collation is the process of sorting information in a useful way. In particular, this module sorts strings in a way that humans might expect to read them.
Nothing, if your strings are all in one language and you speak English yourself.
On the other hand, if that's not the case you need to make sure "ss" and "ß" sort similarly, "å" sorts like "A" (unless you're Swedish), and "21 Monkeys" comes after "3 Monkeys".
Slow enough that you will probably want to cache sort keys. On a mid-range system at the time of its writing, it takes about half a second to sort 10000 song titles.
Copyright (c) 2004 Zope Corporation and Contributors. All Rights Reserved.
This software is subject to the provisions of the Zope Public License, Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution. THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
Copyright 2010 Joe Wreschnig
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
9 years ago | master | shortlog | log | tree |