06.05.2019       Выпуск 281 (06.05.2019 - 12.05.2019)       Статьи

### Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

All programmers will have to write code to sort items or data at some point. Sorting can be critical to the user experience in your application, whether it’s ordering a user’s most recent activity by timestamp, or putting a list of email recipients in alphabetical order by last name. Python sorting functionality offers robust features to do basic sorting or customize ordering at a granular level.

In this guide, you’ll learn how to sort various types of data in different data structures, customize the order, and work with two different methods of sorting in Python.

By the end of this tutorial, you’ll know how to:

• Implement basic Python sorting and ordering on data structures
• Differentiate between `sorted()` and `.sort()`
• Customize a complex sort order in your code based on unique requirements

For this tutorial, you’ll need a basic understanding of lists and tuples as well as sets. Those data structures will be used in this tutorial, and some basic operations will be performed on them. Also, this tutorial uses Python 3, so example output in this tutorial might vary slightly if you’re using Python 2.

## Ordering Values With `sorted()`

To get started with Python sorting, you’re first going to see how to sort both numeric data and string data.

### Sorting Numbers

You can use Python to sort a list by using `sorted()`. In this example, a list of integers is defined, and then `sorted()` is called with the `numbers` variable as the argument:

>>>
```>>> numbers = [6, 9, 3, 1]
>>> sorted(numbers)
[1, 3, 6, 9]
>>> numbers
[6, 9, 3, 1]
```

The output from this code is a new, sorted list. When the original variable is printed, the initial values are unchanged.

This example shows four important characteristics of `sorted()`:

1. The function `sorted()` did not have to be defined. It’s a built-in function that is available in a standard installation of Python.
2. `sorted()`, with no additional arguments or parameters, is ordering the values in `numbers` in an ascending order, meaning smallest to largest.
3. The original `numbers` variable is unchanged because `sorted()` provides sorted output and does not change the original value in place.
4. When `sorted()` is called, it provides an ordered list as a return value.

This last point means that `sorted()` can be used on a list, and the output can immediately be assigned to a variable:

>>>
```>>> numbers = [6, 9, 3, 1]
>>> numbers_sorted = sorted(numbers)
>>> numbers_sorted
[1, 3, 6, 9]
>>> numbers
[6, 9, 3, 1]
```

In this example, there is now a new variable `numbers_sorted` that stored the output of `sorted()`.

You can confirm all of these observations by calling `help()` on `sorted()`. The optional arguments `key` and `reverse` will be covered later in the tutorial:

>>>
```>>> # Python 3
>>> help(sorted)
Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
Return a new list containing all items from the iterable in ascending order.

A custom key function can be supplied to customize the sort order, and the
reverse flag can be set to request the result in descending order.
```

Technical Detail: If you’re transitioning from Python 2 and are familiar with its function of the same name, you should be aware of a couple important changes in Python 3:

1. Python 3’s `sorted()` does not have a `cmp` parameter. Instead, only `key` is used to introduce custom sorting logic.
2. `key` and `reverse` must be passed as keyword arguments, unlike in Python 2, where they could be passed as positional arguments.

If you need to convert a Python 2 `cmp` function to a `key` function, then check out `functools.cmp_to_key()`. This tutorial will not cover any examples using Python 2.

`sorted()` can be used on tuples and sets very similarly:

>>>
```>>> numbers_tuple = (6, 9, 3, 1)
>>> numbers_set = {5, 5, 10, 1, 0}
>>> numbers_tuple_sorted = sorted(numbers_tuple)
>>> numbers_set_sorted = sorted(numbers_set)
>>> numbers_tuple_sorted
[1, 3, 6, 9]
>>> numbers_set_sorted
[0, 1, 5, 10]
```

Notice how even though the input was a set and a tuple, the output is a list because `sorted()` returns a new list by definition. The returned object can be cast to a new type if it needs to match the input type. Be careful if attempting to cast the resulting list back to a set, as a set by definition is unordered:

>>>
```>>> numbers_tuple = (6, 9, 3, 1)
>>> numbers_set = {5, 5, 10, 1, 0}
>>> numbers_tuple_sorted = sorted(numbers_tuple)
>>> numbers_set_sorted = sorted(numbers_set)
>>> numbers_tuple_sorted
[1, 3, 6, 9]
>>> numbers_set_sorted
[0, 1, 5, 10]
>>> tuple(numbers_tuple_sorted)
(1, 3, 6, 9)
>>> set(numbers_set_sorted)
{0, 1, 10, 5}
```

The `numbers_set_sorted` value when cast to a `set` is not ordered, as expected. The other variable, `numbers_tuple_sorted`, retained the sorted order.

### Sorting Strings

`str` types sort similarly to other iterables, like list and tuple. The example below shows how `sorted()` iterates through each character in the value passed to it and orders them in the output:

>>>
```>>> string_number_value = '34521'
>>> string_value = 'I like to sort'
>>> sorted_string_number = sorted(string_number_value)
>>> sorted_string = sorted(string_value)
>>> sorted_string_number
['1', '2', '3', '4', '5']
>>> sorted_string
[' ', ' ', ' ', 'I', 'e', 'i', 'k', 'l', 'o', 'o', 'r', 's', 't', 't']
```

`sorted()` will treat a `str` like a list and iterate through each element. In a `str`, each element means each character in the `str`. `sorted()` will not treat a sentence differently, and it will sort each character, including spaces.

`.split()` can change this behavior and clean up the output, and `.join()` can put it all back together. We will cover the specific order of the output and why it is that way shortly:

>>>
```>>> string_value = 'I like to sort'
>>> sorted_string = sorted(string_value.split())
>>> sorted_string
['I', 'like', 'sort', 'to']
>>> ' '.join(sorted_string)
'I like sort to'
```

The original sentence in this example is converted into a list of words instead of leaving it as a `str`. That list is then sorted and combined to form a `str` again instead of a list.

## Limitations and Gotchas With Python Sorting

It is worth noting some limitations and odd behavior that can arise when you’re using Python to sort values besides integers.

### Lists With Non-Comparable Data Types Can’t Be `sorted()`

There are data types that can’t be compared to each other using just `sorted()` because they are too different. Python will return an error if you attempt to use `sorted()` on a list containing non-comparable data. In this example, a `None` and an `int` in the same list can’t be sorted because of their incompatibility:

>>>
```>>> mixed_types = [None, 0]
>>> sorted(mixed_types)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'
```

This error shows why Python can’t sort the values given to it. It’s trying to put the values in order by using the less than operator (`<`) to determine which value is lower in sorting order. You can replicate this error by manually comparing the two values:

>>>
```>>> None < 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'int'
```

The same `TypeError` is thrown when you try to compare two non-comparable values without using `sorted()`.

If the values within the list can be compared and will not throw a `TypeError`, then the list can be sorted. This prevents sorting iterables with intrinsically unorderable values and producing output that may not make sense.

For example, should the number `1` come before the word `apple`? However, if a iterable contains a combination of integers and strings that are all numbers, they can be cast to comparable data types by using a list comprehension:

>>>
```>>> mixed_numbers = [5, "1", 100, "34"]
>>> sorted(mixed_numbers)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'int'
>>> # List comprehension to convert all values to integers
>>> [int(x) for x in mixed_numbers]
[5, 1, 100, 34]
>>> sorted([int(x) for x in mixed_numbers])
[1, 5, 34, 100]
```

Each element in `mixed_numbers` has `int()` called on it to convert any `str` values to `int` values. `sorted()` is then called and can successfully compare each element and provide a sorted output.

Python can also implicitly convert a value to another type. In the example below, the evaluation of `1 <= 0` is a false statement, so the output of the evaluation will be `False`. The number `1` can be converted to `True` as a `bool` type, while `0` converts to `False`.

Even though the elements in the list look different, they can all be converted to Booleans (`True` or `False`) and compared to each other using `sorted()`:

>>>
```>>> similar_values = [False, 0, 1, 'A' == 'B', 1 <= 0]
>>> sorted(similar_values)
[False, 0, False, False, 1]
```

`'A' == 'B'` and `1 <= 0` are converted to `False` and returned in the ordered output.

This example illustrates an important aspect of sorting: sort stability. In Python, when you sort equal values, they will retain their original order in the output. Even though the `1` moved, all the other values are equal so they retain their original order relative to each other. In the example below, all the values are considered equal and will retain their original positions:

>>>
```>>> false_values = [False, 0, 0, 1 == 2, 0, False, False]
>>> sorted(false_values)
[False, 0, 0, False, 0, False, False]
```

If you inspect the original order and the sorted output, you will see that `1 == 2` is converted to `False`, and all sorted output is in the original order.

### When You’re Sorting Strings, Case Matters

`sorted()` can be used on a list of strings to sort the values in ascending order, which appears to be alphabetically by default:

>>>
```>>> names = ['Harry', 'Suzy', 'Al', 'Mark']
>>> sorted(names)
['Al', 'Harry', 'Mark', 'Suzy']
```

However, Python is using the Unicode Code Point of the first letter in each string to determine ascending sort order. This means that `sorted()` will not treat the names `Al` and `al` the same. This example uses `ord()` to return the Unicode Code Point of the first letter in each string:

>>>
```>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case)
['Mark', 'Suzy', 'al', 'harry']
>>> # List comprehension for Unicode Code Point of first letter in each word
>>> [(ord(name[0]), name[0]) for name in sorted(names_with_case)]
[(77, 'M'), (83, 'S'), (97, 'a'), (104, 'h')]
```

`name[0]` is returning the first character in each element of `sorted(names_with_case)`, and `ord()` is providing the Unicode Code Point. Even though `a` comes before `M` in the alphabet, the code point for `M` comes before `a`, so the sorted output has `M` first.

If the first letter is the same, then `sorted()` will use the second character to determine order, and the third character if that is the same, and so on, all the way to the end of the string:

>>>
```>>> very_similar_strs = ['hhhhhd', 'hhhhha', 'hhhhhc','hhhhhb']
>>> sorted(very_similar_strs)
['hhhhha', 'hhhhhb', 'hhhhhc', 'hhhhhd']
```

Each value of `very_similar_strs` is identical except for the last character. `sorted()` will compare the strings, and because the first five characters are the same, the output will be based on the sixth character.

Strings that contain identical values will end up sorted shortest to longest due to the shorter strings not having elements to compare to with the longer strings:

>>>
```>>> different_lengths = ['hhhh', 'hh', 'hhhhh','h']
>>> sorted(different_lengths)
['h', 'hh', 'hhhh', 'hhhhh']
```

The shortest string, `h`, is ordered first with the longest, `hhhhh`, ordered last.

## Using `sorted()` With a `reverse` Argument

As shown in the `help()` documentation for `sorted()`, there is an optional keyword argument called `reverse`, which will change the sorting behavior based on the Boolean assigned to it. If `reverse` is assigned `True`, then the sorting will be in descending order:

>>>
```>>> names = ['Harry', 'Suzy', 'Al', 'Mark']
>>> sorted(names)
['Al', 'Harry', 'Mark', 'Suzy']
>>> sorted(names, reverse=True)
['Suzy', 'Mark', 'Harry', 'Al']
```

The sorting logic remains the same, meaning that the names are still being sorted by their first letter. But the output has been reversed with the `reverse` keyword set to `True`.

When `False` is assigned, the ordering will remain ascending. Any of the previous examples can be used to see the behavior of reverse using both `True` or `False`:

>>>
```>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case, reverse=True)
['harry', 'al', 'Suzy', 'Mark']
>>> similar_values = [False, 1, 'A' == 'B', 1 <= 0]
>>> sorted(similar_values, reverse=True)
[1, False, False, False]
>>> numbers = [6, 9, 3, 1]
>>> sorted(numbers, reverse=False)
[1, 3, 6, 9]
```

## `sorted()` With a `key` Argument

One of the most powerful components of `sorted()` is the keyword argument called `key`. This argument expects a function to be passed to it, and that function will be used on each value in the list being sorted to determine the resulting order.

To demonstrate a basic example, let’s assume the requirement for ordering a specific list is the length of the strings in the list, shortest to longest. The function to return the length of a string, `len()`, will be used with the `key` argument:

>>>
```>>> word = 'paper'
>>> len(word)
5
>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=len)
['pie', 'book', 'banana', 'Washington']
```

The resulting order is a list with a string order of shortest to longest. The length of each element in the list is determined by `len()` and then returned in ascending order.

Let’s return to the earlier example of sorting by first letter when the case is different. `key` can be used to solve that problem by converting the entire string to lowercase:

>>>
```>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case)
['Mark', 'Suzy', 'al', 'harry']
>>> sorted(names_with_case, key=str.lower)
['al', 'harry', 'Mark', 'Suzy']
```

The output values have not been converted to lowercase because `key` does not manipulate the data in the original list. During sorting, the function passed to `key` is being called on each element to determine sort order, but the original values will be in the output.

There are two main limitations when you’re using functions with the `key` argument.

First, the number of required arguments in the function passed to `key` must be one.

The example below shows the definition of an addition function that takes two arguments. When that function is used in `key` on a list of numbers, it fails because it is missing a second argument. Each time `add()` is called during the sort, it is only receiving one element from the list at a time:

>>>
```>>> def add(x, y):
...     return x + y
...
>>> values_to_add = [1, 2, 3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: add() missing 1 required positional argument: 'y'
```

The second limitation is that the function used with `key` must be able to handle all the values in the iterable. For example, you have a list of numbers represented as strings to be used in `sorted()`, and `key` is going to attempt to convert them to numbers using `int`. If a value in the iterable can’t be cast to an integer, then the function will fail:

>>>
```>>> values_to_cast = ['1', '2', '3', 'four']
>>> sorted(values_to_cast, key=int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'four'
```

Each numeric value as a `str` can be converted to `int`, but `four` can’t. This causes a `ValueError` to be raised and explain that `four` can’t be converted to `int` because it is invalid.

The `key` functionality is extremely powerful because almost any function, built-in or user-defined, can be used to manipulate the output order.

If the ordering requirement is to order an iterable by the last letter in each string (and if the letter is the same, then to use the next letter), then a function can be defined and then used in the sorting. The example below defines a function that reverses the string passed to it, and then that function is used as the argument for `key`:

>>>
```>>> def reverse_word(word):
...     return word[::-1]
...
>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=reverse_word)
['banana', 'pie', 'book', 'Washington']
```

The `word[::-1]` slice syntax is used to reverse a string. Each element will have `reverse_word()` applied to it, and the sorting order will be based on the characters in the backwards word.

Instead of writing a standalone function, you can use a `lambda` function defined in the `key` argument.

A `lambda` is an anonymous function that:

1. Must be defined inline
2. Doesn’t have a name
3. Can’t contain statements
4. Will execute just like a function

In the example below, the `key` is defined as a `lambda` with no name, the argument taken by the `lambda` is `x`, and `x[::-1]` is the operation that will be performed on the argument:

>>>
```>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=lambda x: x[::-1])
['banana', 'pie', 'book', 'Washington']
```

`x[::-1]` is called on each element and reverses the word. That reversed output is then used for sorting, but the original words are still returned.

If the requirement changes, and the order should be reversed as well, then the `reverse` keyword can be used alongside the `key` argument:

>>>
```>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=lambda x: x[::-1], reverse=True)
['Washington', 'book', 'pie', 'banana']
```

`lambda` functions are also useful when you need to sort `class` objects based on a property. If you have a group of students and need to sort them by their final grade, highest to lowest, then a `lambda` can be used to get the `grade` property from the `class`:

>>>
```>>> from collections import namedtuple

>>> StudentFinal = namedtuple('StudentFinal', 'name grade')
>>> bill = StudentFinal('Bill', 90)
>>> patty = StudentFinal('Patty', 94)
>>> bart = StudentFinal('Bart', 89)
>>> students = [bill, patty, bart]
>>> sorted(students, key=lambda x: getattr(x, 'grade'), reverse=True)
```

This example uses `namedtuple` to produce classes with `name` and `grade` attributes. The `lambda` calls `getattr()` on each element and returns the value for `grade`.

`reverse` is set to `True` to make the ascending output flipped to be descending so that the highest grades are ordered first.

The possibilities are endless for how ordering can be done when you leverage both the `key` and `reverse` keyword arguments on `sorted()`. Code can be kept clean and short when you use a basic `lambda` for a small function, or you can write a whole new function, import it, and use it in the key argument.

## Ordering Values With `.sort()`

The very similarly named `.sort()` differs quite a bit from the `sorted()` built-in. They accomplish more or less the same thing, but the `help()` documentation for `list.sort()` highlights two of the most critical differences between `.sort()` and `sorted()`:

>>>
```>>> # Python2
Help on method_descriptor:

sort(...)
L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1

>>> # Python3
>>> help(list.sort)
Help on method_descriptor:

sort(...)
L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
```

First, sort is a method of the `list` class and can only be used with lists. It is not a built-in with an iterable passed to it.

Second, `.sort()` returns `None` and modifies the values in place. Let’s take a look at the impacts of both of these differences in code:

>>>
```>>> values_to_sort = [5, 2, 6, 1]
>>> # Try to call .sort() like sorted()
>>> sort(values_to_sort)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sort' is not defined

>>> # Try to use .sort() on a tuple
>>> tuple_val = (5, 1, 3, 5)
>>> tuple_val.sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'sort'

>>> # Sort the list and assign to new variable
>>> sorted_values = values_to_sort.sort()
>>> print(sorted_values)
None

>>> # Print original variable
>>> print(values_to_sort)
[1, 2, 5, 6]
```

There are some pretty dramatic differences in how `.sort()` operates compared to `sorted()` in this code example:

1. There is no ordered output of `.sort()`, so the assignment to a new variable only passes a `None` type.
2. The `values_to_sort` list has been changed in place, and the original order is not maintained in any way.

These differences in behavior make `.sort()` and `sorted()` absolutely not interchangeable in code, and they can produce wildly unexpected outcomes if one is used in the wrong way.

`.sort()` has the same `key` and `reverse` optional keyword arguments that produce the same robust functionality as `sorted()`. Here, you can sort a list of phrases by the second letter of the third word and return the list in reverse:

>>>
```>>> phrases = ['when in rome',
...     'what goes around comes around',
...     'all is fair in love and war'
...     ]
>>> phrases.sort(key=lambda x: x.split()[2][1], reverse=True)
>>> phrases
['what goes around comes around', 'when in rome', 'all is fair in love and war']
```

In this sample, a `lambda` is used to do the following:

1. Split each phrase into a list of words
2. Find the third element, or word in this case
3. Find the second letter in that word

## When to Use `sorted()` and When to Use `.sort()`

You’ve seen the differences between `sorted()` and `.sort()`, but when do you use which?

Let’s say there is a 5k race coming up: The First Annual Python 5k. The data from the race needs to be captured and sorted. The data that needs to be captured is the runner’s bib number and the number of seconds it took to finish the race:

>>>
```>>> from collections import namedtuple

>>> Runner = namedtuple('Runner', 'bibnumber duration')
```

As the runners cross the finish line, each `Runner` will be added to a list called `runners`. In 5k races, not all runners cross the starting line at the same time, so the first person to cross the finish line might not actually be the fastest person:

>>>
```>>> runners = []
>>> runners.append(Runner('2528567', 1500))
>>> runners.append(Runner('7575234', 1420))
>>> runners.append(Runner('2666234', 1600))
>>> runners.append(Runner('2425234', 1490))
>>> runners.append(Runner('1235234', 1620))
>>> # Thousands and Thousands of entries later...
>>> runners.append(Runner('2526674', 1906))
```

Each time a runner crosses the finish line, their bib number and their total duration in seconds is added to `runners`.

Now, the dutiful programmer in charge of handling the outcome data sees this list, knows that the top 5 fastest participants are the winners that get prizes, and the remaining runners are going to be sorted by fastest time.

There are no requirements for multiple types of sorting by various attributes. The list is a reasonable size. There is no mention of storing the list somewhere. Just sort by duration and grab the five participants with the lowest duration:

>>>
```>>> runners.sort(key=lambda x: getattr(x, 'duration'))
>>> top_five_runners = runners[:5]
```

The programmer chooses to use a `lambda` in the `key` argument to get the `duration` attribute from each runner and sort `runners` in place using `.sort()`. After `runners` is sorted, the first 5 elements are stored in `top_five_runners`.

Mission accomplished! The race director comes over and informs the programmer that since the current release of Python is 3.7, they have decided that every thirty-seventh person that crossed the finish line is going to get a free gym bag.

At this point, the programmer starts to sweat because the list of runners has been irreversibly changed. There is no way to recover the original list of runners in the order they finished and find every thirty-seventh person.

If you’re working with important data, and there is even a remote possibility that the original data will need to be recovered, then `.sort()` is not the best option. If the data is a copy, of if it is unimportant working data, of if no one will mind losing it because it can be retrieved, then `.sort()` can be a fine option.

Alternatively, the runners could have been sorted using `sorted()` and using the same `lambda`:

>>>
```>>> runners_by_duration = sorted(runners, key=lambda x: getattr(x, 'duration'))
>>> top_five_runners = runners_by_duration[:5]
```

In this scenario with `sorted()`, the original list of runners is still intact and has not been overwritten. The impromptu requirement of finding every thirty-seventh person to cross the finish line can be accomplished by interacting with the original values:

>>>
```>>> every_thirtyseventh_runners = runners[::37]
```

`every_thirtyseventh_runners` is created by using a stride in the list slice syntax on `runners`, which still contains the original order in which the runners crossed the finish line.

## How to Sort in Python: Conclusion

`.sort()` and `sorted()` can provide exactly the sort order you need if you use them properly with both the `reverse` and `key` optional keyword arguments.

Both have very different characteristics when it comes to output and in-place modifications, so make sure you think through any application functionality or program that will be using `.sort()` as it can irrevocably overwrite data.

For the avid Pythonistas looking for a challenge with sorting, try using more complex data types in sorting: nested iterables. Also, feel free to dive into the open source Python code implementations for the built-ins and read about the sort algorithm used in Python called Timsort.