A sorteme, by analogy with grapheme/morpheme/etc. is an atom of
sort information. This is larger than a word boundry but smaller
than a sentence boundry; roughly, a sorteme boundry occurs between
- letters and numbers, between numbers and numbrs if 'too much'
+ letters and numbers, between numbers and numbers if 'too much'
punctuation exists in between, between lines.
There is no formal specification for sortemes; the goal of this
string = unicode(string)
categories = map(unicodedata.category, string)
previous = UNKNOWN
- types = []
def stripends(word):
while word and unicodedata.category(word[0])[0] in "PS":
return [(i, key(w) if w else u'') for i, w in words]
-def numeric(orig, invalid=float('inf')):
+def numeric(orig, invalid=INFINITY):
if not orig:
return invalid
string = normalize_punc(string)
- # Early out if possible.
- try:
- return float(string) * mult
- except ValueError:
- pass
-
# Otherwise we need to do this the hard way.
def _numeric(string):
total = 0