9b7adef507fee7f2827baa9347b125f010f694cd
[python-collate.git] / README.md
1 # Collation algorithms for Python
2 -------------------------------------------
3
4 pycollate is an interface to various collation algorithms for Python.
5
6 Supported backends:
7 - `icu` - Based on the IBM ICU toolkit and Jim Fulton's zope.ucol.
8 - `syslocale` - Native OS collation routines.
9 - `codepoint` - Raw Unicode codepoint comparison
10
11 If available, you'll probably want to use the ICU backend. If it's not
12 available, syslocale should work on most Python installations. A
13 specific backend can be used, or a "best" backend is chosen by
14 default.
15
16 pycollate also provides tools to perform word-wise and numeric sorts.
17
18 pycollate, as with all Unicode collation tools, is a work in progress.
19
20 ## Installing
21
22 $ sudo apt-get install python-pyrex libicu-dev
23 $ ./setup.py build
24 $ sudo ./setup.py install
25
26 ## Example
27
28 import collate
29 strings = open("contents.txt").read().decode("utf-8").splitlines()
30 strings.sort(key=collate.key)
31
32 ## FAQ
33
34 ### What's collation?
35
36 Collation is the process of sorting information in a useful way. In
37 particular, this module sorts strings in a way that humans might
38 expect to read them.
39
40 ### What's so hard about that?
41
42 Nothing, if your strings are all in one language and you speak English
43 yourself.
44
45 On the other hand, if that's not the case you need to make sure "ss"
46 and "ß" sort similarly, "å" sorts like "A" (unless you're Swedish),
47 and "21 Monkeys" comes after "3 Monkeys".
48
49 ### How fast is the library?
50
51 Slow enough that you will probably want to cache sort keys. On a
52 mid-range system at the time of its writing, it takes about half a
53 second to sort 10000 song titles.
54
55
56 ## License
57
58 ### icu/_icu.pyx
59
60 Copyright (c) 2004 Zope Corporation and Contributors.
61 All Rights Reserved.
62
63 This software is subject to the provisions of the Zope Public License,
64 Version 2.1 (ZPL). A copy of the ZPL should accompany this distribution.
65 THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
66 WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
67 WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
68 FOR A PARTICULAR PURPOSE.
69
70 ### All else
71
72 Copyright 2010 Joe Wreschnig
73
74 Permission is hereby granted, free of charge, to any person obtaining a copy
75 of this software and associated documentation files (the "Software"), to deal
76 in the Software without restriction, including without limitation the rights
77 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
78 copies of the Software, and to permit persons to whom the Software is
79 furnished to do so, subject to the following conditions:
80
81 The above copyright notice and this permission notice shall be included in
82 all copies or substantial portions of the Software.
83
84 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
85 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
86 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
87 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
88 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
89 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
90 THE SOFTWARE.