Data Structure Introduction

This notebook is to help you learn the basics of Python data structures, particularly lists and dictionaries. You have hopefully seen a little bit of these two data structures in 2308, as list or vector and hash_map. Python and JavaScript both make these structures very easy to work with, and web development depends heavily on them.

Warmup: Basic Python Types

Python provides a variety of basic data types that support common operations.

You can write numbers and strings:

In [1]:
a_num = 5
a_str = 'puppy'

You can do math with these:

In [2]:
a_num * 3
Out[2]:
15

And manipulate strings:

In [3]:
a_str + ' dog'
Out[3]:
'puppy dog'

Python has another really cool feature for strings: string formatting. You can write a string that has some placeholders ({}), and then fill those in with values:

In [4]:
'the {} says {} {} times'.format('cat', 'meow', 5)
Out[4]:
'the cat says meow 5 times'

This is most useful when you have variables that you want to put in to a string:

In [5]:
'the {} says {}'.format(a_str, 'woof')
Out[5]:
'the puppy says woof'

To find out how long a string is, use the len function:

In [6]:
len(a_str)
Out[6]:
5

Lists

Lists do what they say on the tin: they store a list of things. Think like a grocery list, or a to-do list: it is a sequence of items.

For example, the following code creates a new list and assigns it to a variable called beatles:

In [7]:
beatles = ['John', 'Paul', 'George', 'Ringo']

We can see that the list has 4 items, again with the len function:

In [8]:
len(beatles)
Out[8]:
4

The len function applies to lots of different kinds of Python objects. If you want to find out how ‘big’ something is, try len.

Lists function a lot like arrays in C++. You can get a specific item (indexes start from 0, just like C++):

In [9]:
beatles[0]
Out[9]:
'John'

You can also reassign a particular value:

In [10]:
beatles[2] = 'Albert'

One of the most common things to do with a list is to iterate over it. Python's for loop does this: it loops over the list, running the code inside it once for each element in the list. The loop also defines a variable that, during each time through the loop, stores that particular value. For example, if we want to print the Beatles:

In [11]:
for beatle in beatles:
    print(beatle)
John
Paul
Albert
Ringo

We can see the different members of the Beatles. Well, with one change — the member we swapped out back in line 10.

What if we want to know which beatle number we're at? In C++, when you loop over an array using an index variable, e.g.

for (int i = 0; i < n_items; i++) {
}

you have the i variable that tells you where you are. In Python, to do that, we use the enumerate function and loop over its results:

In [12]:
for i, beatle in enumerate(beatles):
    print('Beatle number {} is {}'.format(i+1, beatle))
Beatle number 1 is John
Beatle number 2 is Paul
Beatle number 3 is Albert
Beatle number 4 is Ringo

What enumerate does is associate a position with each element in the list, and then we loop over those pairs.

We can also loop with an index i, but it is not idiomatic Python:

In [13]:
for i in range(len(beatles)):
    print('Beatle number {} is {}'.format(i+1, beatles[i]))
Beatle number 1 is John
Beatle number 2 is Paul
Beatle number 3 is Albert
Beatle number 4 is Ringo

OK, so we can make lists, and just write them out — that's easier than C++! — and we can iterate over them (with for), and find out how long they are (with len). What else can we do with them?

One really useful thing is to append to them. This is where they get much more interesting than C++ arrays. Let's start by storing an empty list in a variable:

In [14]:
results = []

Now results references an empty list — there's nothing in it:

In [15]:
len(results)
Out[15]:
0

Let's add something:

In [16]:
results.append('wumpus')

Each call to results.append added a new element to the end of results. We can see that there's an item now:

In [17]:
len(results)
Out[17]:
1

And what it is:

In [18]:
results[0]
Out[18]:
'wumpus'

We can also delete it:

In [19]:
del results[0]
len(results)
Out[19]:
0

Let's put several things in it. Maybe, things about the Beatles:

In [20]:
# Even though results is empty again, it's good practice to create our
# empty accumulator right before the loop:
results = []
for i, beatle in enumerate(beatles):
    results.append('Beatle number {} is {}'.format(i+1, beatle))

Now how many results do we have?

In [21]:
len(results)
Out[21]:
4

And what, praytell, are these results?

In [22]:
results
Out[22]:
['Beatle number 1 is John',
 'Beatle number 2 is Paul',
 'Beatle number 3 is Albert',
 'Beatle number 4 is Ringo']

(The Python notebook dumps out a list in Python syntax if we just ask it to display a list. Pretty nifty.)

There are a bunch more things that we can do with lists, but those are the basics. The big things to remember are:

  1. A list is a lot like an array — it stores multiple things, and they are accessible with (0-based) indexes.
  2. Lists are unlike arrays in that items can be added (with append) and removed (with del).
  3. Lists keep things in a specific order.

Dictionaries

Lists let us store a bunch of things in some order, and we can access them by position (or index) in the list.

But what if we have things that have names, and maybe we don't care about order?

A dictionary allows us to store a bunch of things and access them by name (called a key) rather than by position. A name, not a number.

Let's make a simple dictionary that describes a British monarch:

In [23]:
liz_the_first = {
    'name': 'Elizabeth I',
    'region': 'England',
    'birth_date': '1533-09-07',
    'death_date': '1603-03-24',
    'coronation_date': '1559-01-15'
}

The squiggly braces denote a dictionary, and the keys and values are separated by colons (:). We say that the dictionary maps its keys to values.

Again, len tells us how many things are in it:

In [24]:
len(liz_the_first)
Out[24]:
5

We can get individual items:

In [25]:
liz_the_first['name']
Out[25]:
'Elizabeth I'

We see here that items are identified by their keys, or names, instead of numbers. So we can think of a dictionary as being sort-of like a list, with two differences:

  • The order of items is not guaranteed or preserved (items can be in any order).
  • We can use any kind of key, rather than just numbers $i$ in the range $0 \le n \lt n$.

Very useful for many kinds of things.

While we can theoretically use any kind of value for the key, we will generally use strings or numbers.

We can also assign new values:

In [26]:
liz_the_first['mother'] = 'Anne Boleyn'
liz_the_first['family'] = 'Tudor'

Now we have two new values:

In [27]:
len(liz_the_first)
Out[27]:
7

We can get the list of keys (or names) in the dictionary:

In [28]:
liz_the_first.keys()
Out[28]:
dict_keys(['death_date', 'mother', 'birth_date', 'family', 'coronation_date', 'name', 'region'])

It says dict_keys at the beginning, but we can treat it just like a list:

In [29]:
for k in liz_the_first.keys():
    print('field: {}'.format(k))
field: death_date
field: mother
field: birth_date
field: family
field: coronation_date
field: name
field: region

Notice that the keys are in a seemingly random order. This is because they are arranged for the internal convenience of the dictionary implementation; we get no choice in the order. However, since we care about names, not numbers, that is (generally) not a problem.

We can also delete keys, like we did with list items, and we can add any kind of value, not just strings. For example, Liz was the queen of multiple regions, so let's be more precise:

In [30]:
del liz_the_first['region']
liz_the_first['regions'] = ['England', 'Ireland']
liz_the_first
Out[30]:
{'birth_date': '1533-09-07',
 'coronation_date': '1559-01-15',
 'death_date': '1603-03-24',
 'family': 'Tudor',
 'mother': 'Anne Boleyn',
 'name': 'Elizabeth I',
 'regions': ['England', 'Ireland']}

The notebook has helpfully printed out the dictionary, so we can see that there is no longer a ‘region’ key, but ‘regions’ is now mapped to a list of the countries of which Elizabeth I was the queen.

Now, we can also iterate over the items (key-value pairs) in a dictionary, in a manner very similar to the enumerate thing we did to the list:

In [31]:
for key, value in liz_the_first.items():
    print('{}: {}'.format(key, value))
regions: ['England', 'Ireland']
death_date: 1603-03-24
mother: Anne Boleyn
birth_date: 1533-09-07
family: Tudor
coronation_date: 1559-01-15
name: Elizabeth I

We can also get the values of the dictionary, again as a list:

In [32]:
liz_the_first.values()
Out[32]:
dict_values([['England', 'Ireland'], '1603-03-24', 'Anne Boleyn', '1533-09-07', 'Tudor', '1559-01-15', 'Elizabeth I'])

Summary

We're going to be using lists and dictionaries a lot in both Python and JavaScript (the concepts are very similar between two languages, and I will be providing a JavaScript version of this notebook when the time comes).

In [ ]: