06.05.2019       Выпуск 281 (06.05.2019 - 12.05.2019)       Статьи

Как использовать sorted() и sort() в Python


Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

All programmers will have to write code to sort items or data at some point. Sorting can be critical to the user experience in your application, whether it’s ordering a user’s most recent activity by timestamp, or putting a list of email recipients in alphabetical order by last name. Python sorting functionality offers robust features to do basic sorting or customize ordering at a granular level.

In this guide, you’ll learn how to sort various types of data in different data structures, customize the order, and work with two different methods of sorting in Python.

By the end of this tutorial, you’ll know how to:

  • Implement basic Python sorting and ordering on data structures
  • Differentiate between sorted() and .sort()
  • Customize a complex sort order in your code based on unique requirements

For this tutorial, you’ll need a basic understanding of lists and tuples as well as sets. Those data structures will be used in this tutorial, and some basic operations will be performed on them. Also, this tutorial uses Python 3, so example output in this tutorial might vary slightly if you’re using Python 2.

Ordering Values With sorted()

To get started with Python sorting, you’re first going to see how to sort both numeric data and string data.

Sorting Numbers

You can use Python to sort a list by using sorted(). In this example, a list of integers is defined, and then sorted() is called with the numbers variable as the argument:

>>> numbers = [6, 9, 3, 1]
>>> sorted(numbers)
[1, 3, 6, 9]
>>> numbers
[6, 9, 3, 1]

The output from this code is a new, sorted list. When the original variable is printed, the initial values are unchanged.

This example shows four important characteristics of sorted():

  1. The function sorted() did not have to be defined. It’s a built-in function that is available in a standard installation of Python.
  2. sorted(), with no additional arguments or parameters, is ordering the values in numbers in an ascending order, meaning smallest to largest.
  3. The original numbers variable is unchanged because sorted() provides sorted output and does not change the original value in place.
  4. When sorted() is called, it provides an ordered list as a return value.

This last point means that sorted() can be used on a list, and the output can immediately be assigned to a variable:

>>> numbers = [6, 9, 3, 1]
>>> numbers_sorted = sorted(numbers)
>>> numbers_sorted
[1, 3, 6, 9]
>>> numbers
[6, 9, 3, 1]

In this example, there is now a new variable numbers_sorted that stored the output of sorted().

You can confirm all of these observations by calling help() on sorted(). The optional arguments key and reverse will be covered later in the tutorial:

>>> # Python 3
>>> help(sorted)
Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.

    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.

Technical Detail: If you’re transitioning from Python 2 and are familiar with its function of the same name, you should be aware of a couple important changes in Python 3:

  1. Python 3’s sorted() does not have a cmp parameter. Instead, only key is used to introduce custom sorting logic.
  2. key and reverse must be passed as keyword arguments, unlike in Python 2, where they could be passed as positional arguments.

If you need to convert a Python 2 cmp function to a key function, then check out functools.cmp_to_key(). This tutorial will not cover any examples using Python 2.

sorted() can be used on tuples and sets very similarly:

>>> numbers_tuple = (6, 9, 3, 1)
>>> numbers_set = {5, 5, 10, 1, 0}
>>> numbers_tuple_sorted = sorted(numbers_tuple)
>>> numbers_set_sorted = sorted(numbers_set)
>>> numbers_tuple_sorted
[1, 3, 6, 9]
>>> numbers_set_sorted
[0, 1, 5, 10]

Notice how even though the input was a set and a tuple, the output is a list because sorted() returns a new list by definition. The returned object can be cast to a new type if it needs to match the input type. Be careful if attempting to cast the resulting list back to a set, as a set by definition is unordered:

>>> numbers_tuple = (6, 9, 3, 1)
>>> numbers_set = {5, 5, 10, 1, 0}
>>> numbers_tuple_sorted = sorted(numbers_tuple)
>>> numbers_set_sorted = sorted(numbers_set)
>>> numbers_tuple_sorted
[1, 3, 6, 9]
>>> numbers_set_sorted
[0, 1, 5, 10]
>>> tuple(numbers_tuple_sorted)
(1, 3, 6, 9)
>>> set(numbers_set_sorted)
{0, 1, 10, 5}

The numbers_set_sorted value when cast to a set is not ordered, as expected. The other variable, numbers_tuple_sorted, retained the sorted order.

Sorting Strings

str types sort similarly to other iterables, like list and tuple. The example below shows how sorted() iterates through each character in the value passed to it and orders them in the output:

>>> string_number_value = '34521'
>>> string_value = 'I like to sort'
>>> sorted_string_number = sorted(string_number_value)
>>> sorted_string = sorted(string_value)
>>> sorted_string_number
['1', '2', '3', '4', '5']
>>> sorted_string
[' ', ' ', ' ', 'I', 'e', 'i', 'k', 'l', 'o', 'o', 'r', 's', 't', 't']

sorted() will treat a str like a list and iterate through each element. In a str, each element means each character in the str. sorted() will not treat a sentence differently, and it will sort each character, including spaces.

.split() can change this behavior and clean up the output, and .join() can put it all back together. We will cover the specific order of the output and why it is that way shortly:

>>> string_value = 'I like to sort'
>>> sorted_string = sorted(string_value.split())
>>> sorted_string
['I', 'like', 'sort', 'to']
>>> ' '.join(sorted_string)
'I like sort to'

The original sentence in this example is converted into a list of words instead of leaving it as a str. That list is then sorted and combined to form a str again instead of a list.

Limitations and Gotchas With Python Sorting

It is worth noting some limitations and odd behavior that can arise when you’re using Python to sort values besides integers.

Lists With Non-Comparable Data Types Can’t Be sorted()

There are data types that can’t be compared to each other using just sorted() because they are too different. Python will return an error if you attempt to use sorted() on a list containing non-comparable data. In this example, a None and an int in the same list can’t be sorted because of their incompatibility:

>>> mixed_types = [None, 0]
>>> sorted(mixed_types)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'int' and 'NoneType'

This error shows why Python can’t sort the values given to it. It’s trying to put the values in order by using the less than operator (<) to determine which value is lower in sorting order. You can replicate this error by manually comparing the two values:

>>> None < 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'NoneType' and 'int'

The same TypeError is thrown when you try to compare two non-comparable values without using sorted().

If the values within the list can be compared and will not throw a TypeError, then the list can be sorted. This prevents sorting iterables with intrinsically unorderable values and producing output that may not make sense.

For example, should the number 1 come before the word apple? However, if a iterable contains a combination of integers and strings that are all numbers, they can be cast to comparable data types by using a list comprehension:

>>> mixed_numbers = [5, "1", 100, "34"]
>>> sorted(mixed_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'int'
>>> # List comprehension to convert all values to integers
>>> [int(x) for x in mixed_numbers]
[5, 1, 100, 34]
>>> sorted([int(x) for x in mixed_numbers])
[1, 5, 34, 100]

Each element in mixed_numbers has int() called on it to convert any str values to int values. sorted() is then called and can successfully compare each element and provide a sorted output.

Python can also implicitly convert a value to another type. In the example below, the evaluation of 1 <= 0 is a false statement, so the output of the evaluation will be False. The number 1 can be converted to True as a bool type, while 0 converts to False.

Even though the elements in the list look different, they can all be converted to Booleans (True or False) and compared to each other using sorted():

>>> similar_values = [False, 0, 1, 'A' == 'B', 1 <= 0]
>>> sorted(similar_values)
[False, 0, False, False, 1]

'A' == 'B' and 1 <= 0 are converted to False and returned in the ordered output.

This example illustrates an important aspect of sorting: sort stability. In Python, when you sort equal values, they will retain their original order in the output. Even though the 1 moved, all the other values are equal so they retain their original order relative to each other. In the example below, all the values are considered equal and will retain their original positions:

>>> false_values = [False, 0, 0, 1 == 2, 0, False, False]
>>> sorted(false_values)
[False, 0, 0, False, 0, False, False]

If you inspect the original order and the sorted output, you will see that 1 == 2 is converted to False, and all sorted output is in the original order.

When You’re Sorting Strings, Case Matters

sorted() can be used on a list of strings to sort the values in ascending order, which appears to be alphabetically by default:

>>> names = ['Harry', 'Suzy', 'Al', 'Mark']
>>> sorted(names)
['Al', 'Harry', 'Mark', 'Suzy']

However, Python is using the Unicode Code Point of the first letter in each string to determine ascending sort order. This means that sorted() will not treat the names Al and al the same. This example uses ord() to return the Unicode Code Point of the first letter in each string:

>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case)
['Mark', 'Suzy', 'al', 'harry']
>>> # List comprehension for Unicode Code Point of first letter in each word
>>> [(ord(name[0]), name[0]) for name in sorted(names_with_case)]
[(77, 'M'), (83, 'S'), (97, 'a'), (104, 'h')]

name[0] is returning the first character in each element of sorted(names_with_case), and ord() is providing the Unicode Code Point. Even though a comes before M in the alphabet, the code point for M comes before a, so the sorted output has M first.

If the first letter is the same, then sorted() will use the second character to determine order, and the third character if that is the same, and so on, all the way to the end of the string:

>>> very_similar_strs = ['hhhhhd', 'hhhhha', 'hhhhhc','hhhhhb']
>>> sorted(very_similar_strs)
['hhhhha', 'hhhhhb', 'hhhhhc', 'hhhhhd']

Each value of very_similar_strs is identical except for the last character. sorted() will compare the strings, and because the first five characters are the same, the output will be based on the sixth character.

Strings that contain identical values will end up sorted shortest to longest due to the shorter strings not having elements to compare to with the longer strings:

>>> different_lengths = ['hhhh', 'hh', 'hhhhh','h']
>>> sorted(different_lengths)
['h', 'hh', 'hhhh', 'hhhhh']

The shortest string, h, is ordered first with the longest, hhhhh, ordered last.

Using sorted() With a reverse Argument

As shown in the help() documentation for sorted(), there is an optional keyword argument called reverse, which will change the sorting behavior based on the Boolean assigned to it. If reverse is assigned True, then the sorting will be in descending order:

>>> names = ['Harry', 'Suzy', 'Al', 'Mark']
>>> sorted(names)
['Al', 'Harry', 'Mark', 'Suzy']
>>> sorted(names, reverse=True)
['Suzy', 'Mark', 'Harry', 'Al']

The sorting logic remains the same, meaning that the names are still being sorted by their first letter. But the output has been reversed with the reverse keyword set to True.

When False is assigned, the ordering will remain ascending. Any of the previous examples can be used to see the behavior of reverse using both True or False:

>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case, reverse=True)
['harry', 'al', 'Suzy', 'Mark']
>>> similar_values = [False, 1, 'A' == 'B', 1 <= 0]
>>> sorted(similar_values, reverse=True)
[1, False, False, False]
>>> numbers = [6, 9, 3, 1]
>>> sorted(numbers, reverse=False)
[1, 3, 6, 9]

sorted() With a key Argument

One of the most powerful components of sorted() is the keyword argument called key. This argument expects a function to be passed to it, and that function will be used on each value in the list being sorted to determine the resulting order.

To demonstrate a basic example, let’s assume the requirement for ordering a specific list is the length of the strings in the list, shortest to longest. The function to return the length of a string, len(), will be used with the key argument:

>>> word = 'paper'
>>> len(word)
>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=len)
['pie', 'book', 'banana', 'Washington']

The resulting order is a list with a string order of shortest to longest. The length of each element in the list is determined by len() and then returned in ascending order.

Let’s return to the earlier example of sorting by first letter when the case is different. key can be used to solve that problem by converting the entire string to lowercase:

>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted(names_with_case)
['Mark', 'Suzy', 'al', 'harry']
>>> sorted(names_with_case, key=str.lower)
['al', 'harry', 'Mark', 'Suzy']

The output values have not been converted to lowercase because key does not manipulate the data in the original list. During sorting, the function passed to key is being called on each element to determine sort order, but the original values will be in the output.

There are two main limitations when you’re using functions with the key argument.

First, the number of required arguments in the function passed to key must be one.

The example below shows the definition of an addition function that takes two arguments. When that function is used in key on a list of numbers, it fails because it is missing a second argument. Each time add() is called during the sort, it is only receiving one element from the list at a time:

>>> def add(x, y):
...     return x + y
>>> values_to_add = [1, 2, 3]
>>> sorted(values_to_add, key=add)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: add() missing 1 required positional argument: 'y'

The second limitation is that the function used with key must be able to handle all the values in the iterable. For example, you have a list of numbers represented as strings to be used in sorted(), and key is going to attempt to convert them to numbers using int. If a value in the iterable can’t be cast to an integer, then the function will fail:

>>> values_to_cast = ['1', '2', '3', 'four']
>>> sorted(values_to_cast, key=int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'four'

Each numeric value as a str can be converted to int, but four can’t. This causes a ValueError to be raised and explain that four can’t be converted to int because it is invalid.

The key functionality is extremely powerful because almost any function, built-in or user-defined, can be used to manipulate the output order.

If the ordering requirement is to order an iterable by the last letter in each string (and if the letter is the same, then to use the next letter), then a function can be defined and then used in the sorting. The example below defines a function that reverses the string passed to it, and then that function is used as the argument for key:

>>> def reverse_word(word):
...     return word[::-1]
>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=reverse_word)
['banana', 'pie', 'book', 'Washington']

The word[::-1] slice syntax is used to reverse a string. Each element will have reverse_word() applied to it, and the sorting order will be based on the characters in the backwards word.

Instead of writing a standalone function, you can use a lambda function defined in the key argument.

A lambda is an anonymous function that:

  1. Must be defined inline
  2. Doesn’t have a name
  3. Can’t contain statements
  4. Will execute just like a function

In the example below, the key is defined as a lambda with no name, the argument taken by the lambda is x, and x[::-1] is the operation that will be performed on the argument:

>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=lambda x: x[::-1])
['banana', 'pie', 'book', 'Washington']

x[::-1] is called on each element and reverses the word. That reversed output is then used for sorting, but the original words are still returned.

If the requirement changes, and the order should be reversed as well, then the reverse keyword can be used alongside the key argument:

>>> words = ['banana', 'pie', 'Washington', 'book']
>>> sorted(words, key=lambda x: x[::-1], reverse=True)
['Washington', 'book', 'pie', 'banana']

lambda functions are also useful when you need to sort class objects based on a property. If you have a group of students and need to sort them by their final grade, highest to lowest, then a lambda can be used to get the grade property from the class:

>>> from collections import namedtuple

>>> StudentFinal = namedtuple('StudentFinal', 'name grade')
>>> bill = StudentFinal('Bill', 90)
>>> patty = StudentFinal('Patty', 94)
>>> bart = StudentFinal('Bart', 89)
>>> students = [bill, patty, bart]
>>> sorted(students, key=lambda x: getattr(x, 'grade'), reverse=True)
[StudentFinal(name='Patty', grade=94), StudentFinal(name='Bill', grade=90), StudentFinal(name='Bart', grade=89)]

This example uses namedtuple to produce classes with name and grade attributes. The lambda calls getattr() on each element and returns the value for grade.

reverse is set to True to make the ascending output flipped to be descending so that the highest grades are ordered first.

The possibilities are endless for how ordering can be done when you leverage both the key and reverse keyword arguments on sorted(). Code can be kept clean and short when you use a basic lambda for a small function, or you can write a whole new function, import it, and use it in the key argument.

Ordering Values With .sort()

The very similarly named .sort() differs quite a bit from the sorted() built-in. They accomplish more or less the same thing, but the help() documentation for list.sort() highlights two of the most critical differences between .sort() and sorted():

>>> # Python2
Help on method_descriptor:

    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
    cmp(x, y) -> -1, 0, 1

>>> # Python3
>>> help(list.sort)
Help on method_descriptor:

    L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*

First, sort is a method of the list class and can only be used with lists. It is not a built-in with an iterable passed to it.

Second, .sort() returns None and modifies the values in place. Let’s take a look at the impacts of both of these differences in code:

>>> values_to_sort = [5, 2, 6, 1]
>>> # Try to call .sort() like sorted()
>>> sort(values_to_sort)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sort' is not defined

>>> # Try to use .sort() on a tuple
>>> tuple_val = (5, 1, 3, 5)
>>> tuple_val.sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'sort'

>>> # Sort the list and assign to new variable
>>> sorted_values = values_to_sort.sort()
>>> print(sorted_values)

>>> # Print original variable
>>> print(values_to_sort)
[1, 2, 5, 6]

There are some pretty dramatic differences in how .sort() operates compared to sorted() in this code example:

  1. There is no ordered output of .sort(), so the assignment to a new variable only passes a None type.
  2. The values_to_sort list has been changed in place, and the original order is not maintained in any way.

These differences in behavior make .sort() and sorted() absolutely not interchangeable in code, and they can produce wildly unexpected outcomes if one is used in the wrong way.

.sort() has the same key and reverse optional keyword arguments that produce the same robust functionality as sorted(). Here, you can sort a list of phrases by the second letter of the third word and return the list in reverse:

>>> phrases = ['when in rome', 
...     'what goes around comes around', 
...     'all is fair in love and war'
...     ]
>>> phrases.sort(key=lambda x: x.split()[2][1], reverse=True)
>>> phrases
['what goes around comes around', 'when in rome', 'all is fair in love and war']

In this sample, a lambda is used to do the following:

  1. Split each phrase into a list of words
  2. Find the third element, or word in this case
  3. Find the second letter in that word

When to Use sorted() and When to Use .sort()

You’ve seen the differences between sorted() and .sort(), but when do you use which?

Let’s say there is a 5k race coming up: The First Annual Python 5k. The data from the race needs to be captured and sorted. The data that needs to be captured is the runner’s bib number and the number of seconds it took to finish the race:

>>> from collections import namedtuple

>>> Runner = namedtuple('Runner', 'bibnumber duration')

As the runners cross the finish line, each Runner will be added to a list called runners. In 5k races, not all runners cross the starting line at the same time, so the first person to cross the finish line might not actually be the fastest person:

>>> runners = []
>>> runners.append(Runner('2528567', 1500))
>>> runners.append(Runner('7575234', 1420))
>>> runners.append(Runner('2666234', 1600))
>>> runners.append(Runner('2425234', 1490))
>>> runners.append(Runner('1235234', 1620))
>>> # Thousands and Thousands of entries later...
>>> runners.append(Runner('2526674', 1906))

Each time a runner crosses the finish line, their bib number and their total duration in seconds is added to runners.

Now, the dutiful programmer in charge of handling the outcome data sees this list, knows that the top 5 fastest participants are the winners that get prizes, and the remaining runners are going to be sorted by fastest time.

There are no requirements for multiple types of sorting by various attributes. The list is a reasonable size. There is no mention of storing the list somewhere. Just sort by duration and grab the five participants with the lowest duration:

>>> runners.sort(key=lambda x: getattr(x, 'duration'))
>>> top_five_runners = runners[:5]

The programmer chooses to use a lambda in the key argument to get the duration attribute from each runner and sort runners in place using .sort(). After runners is sorted, the first 5 elements are stored in top_five_runners.

Mission accomplished! The race director comes over and informs the programmer that since the current release of Python is 3.7, they have decided that every thirty-seventh person that crossed the finish line is going to get a free gym bag.

At this point, the programmer starts to sweat because the list of runners has been irreversibly changed. There is no way to recover the original list of runners in the order they finished and find every thirty-seventh person.

If you’re working with important data, and there is even a remote possibility that the original data will need to be recovered, then .sort() is not the best option. If the data is a copy, of if it is unimportant working data, of if no one will mind losing it because it can be retrieved, then .sort() can be a fine option.

Alternatively, the runners could have been sorted using sorted() and using the same lambda:

>>> runners_by_duration = sorted(runners, key=lambda x: getattr(x, 'duration'))
>>> top_five_runners = runners_by_duration[:5]

In this scenario with sorted(), the original list of runners is still intact and has not been overwritten. The impromptu requirement of finding every thirty-seventh person to cross the finish line can be accomplished by interacting with the original values:

>>> every_thirtyseventh_runners = runners[::37]

every_thirtyseventh_runners is created by using a stride in the list slice syntax on runners, which still contains the original order in which the runners crossed the finish line.

How to Sort in Python: Conclusion

.sort() and sorted() can provide exactly the sort order you need if you use them properly with both the reverse and key optional keyword arguments.

Both have very different characteristics when it comes to output and in-place modifications, so make sure you think through any application functionality or program that will be using .sort() as it can irrevocably overwrite data.

For the avid Pythonistas looking for a challenge with sorting, try using more complex data types in sorting: nested iterables. Also, feel free to dive into the open source Python code implementations for the built-ins and read about the sort algorithm used in Python called Timsort.

Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus