# Accessing Rows

The second dimension of our data consists of its rows or *individuals*.

## The index

As you'll recall from [Creating a DataFrame](../1/Creating_DataFrame.html#the-index) â€“ and just as it did for our columns â€“ pandas has constructed an index for our rows.

By default, the values of this index are the familiar `0, 1, 2, â€¦`, and represented by the `RangeIndex` type.

In [1]:
# cell hidden by tag
import pandas as pd

planets_dict = {
    'name': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'],
    'solar_distance_km_6': [57.9, 108.2, 149.6, 227.9, 778.6, 1433.5, 2872.5, 4495.1],
    'mass_kg_24': [0.33, 4.87, 5.97, 0.642, 1898.0, 568.0, 86.8, 102.0],
    'density_kg_m3': [5427.0, 5243.0, 5514.0, 3933.0, 1326.0, 687.0, 1271.0, 1638.0],
    'gravity_m_s2': [3.7, 8.9, 9.8, 3.7, 23.1, 9.0, 8.7, 11.0],
}

planets = pd.DataFrame(planets_dict)

In [2]:
planets.index

RangeIndex(start=0, stop=8, step=1)

But we can always set alternative indices.

In [3]:
planets_natural = planets.set_index(pd.RangeIndex(1, 9, name='number'))

planets_natural.index

RangeIndex(start=1, stop=9, step=1, name='number')

In [4]:
planets_named = planets.set_index('name')

planets_named.index

Index(['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus',
       'Neptune'],
      dtype='object', name='name')

## Counting

As with the `list`, we can use the built-in function `len` to see that there are eight planets.

In [5]:
len(planets)

8

## Slicing

We can also *slice* the `DataFrame`, for example to extract only its first three rows.

In [6]:
planets[:3]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8


Above, our slice has constructed a new `DataFrame`, consisting of only the data for the first three planets.

This works the same with our alternative indices â€“ generic row slices step through the index **regardless of the values within the index**.

In [7]:
planets_natural[:3]

Unnamed: 0_level_0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Mercury,57.9,0.33,5427.0,3.7
2,Venus,108.2,4.87,5243.0,8.9
3,Earth,149.6,5.97,5514.0,9.8


In [8]:
planets_named[:3]

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Venus,108.2,4.87,5243.0,8.9
Earth,149.6,5.97,5514.0,9.8


Of course, [slices may be more sophisticated](../../04/1/Lists.html#slices) than the above.

In [9]:
planets_named[2:5]

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Earth,149.6,5.97,5514.0,9.8
Mars,227.9,0.642,3933.0,3.7
Jupiter,778.6,1898.0,1326.0,23.1


In [10]:
planets_named[::2]

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Earth,149.6,5.97,5514.0,9.8
Jupiter,778.6,1898.0,1326.0,23.1
Uranus,2872.5,86.8,1271.0,8.7


### Head & Tail

But as a shortcut to inspecting just the first few â€“ or last few â€“ rows of a `DataFrame`, there are the methods `head` and `tail`.

In [11]:
planets_named.head()

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Venus,108.2,4.87,5243.0,8.9
Earth,149.6,5.97,5514.0,9.8
Mars,227.9,0.642,3933.0,3.7
Jupiter,778.6,1898.0,1326.0,23.1


In [12]:
planets_named.tail()

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mars,227.9,0.642,3933.0,3.7
Jupiter,778.6,1898.0,1326.0,23.1
Saturn,1433.5,568.0,687.0,9.0
Uranus,2872.5,86.8,1271.0,8.7
Neptune,4495.1,102.0,1638.0,11.0


By default, these methods display the first or last five rows. This number may also be specified.

In [13]:
planets_named.head(2)

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Venus,108.2,4.87,5243.0,8.9


In [14]:
planets_named.tail(2)

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Uranus,2872.5,86.8,1271.0,8.7
Neptune,4495.1,102.0,1638.0,11.0


As with slices, even negative integers are supported â€“ to display _all but_ the first or last `n` rows.

In [15]:
planets_named.head(-2)

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Venus,108.2,4.87,5243.0,8.9
Earth,149.6,5.97,5514.0,9.8
Mars,227.9,0.642,3933.0,3.7
Jupiter,778.6,1898.0,1326.0,23.1
Saturn,1433.5,568.0,687.0,9.0


In [16]:
planets_named.tail(-2)

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Earth,149.6,5.97,5514.0,9.8
Mars,227.9,0.642,3933.0,3.7
Jupiter,778.6,1898.0,1326.0,23.1
Saturn,1433.5,568.0,687.0,9.0
Uranus,2872.5,86.8,1271.0,8.7
Neptune,4495.1,102.0,1638.0,11.0


## Retrieving

:::{attention}
Above, we treated our `DataFrame` like Python's `list` to count and slice its rows.

However, you *can't* access *individual* rows in the same manner as with a `list`.

The below expression raises an exception!

(Click below to view the scary traceback.)
:::

In [17]:
planets[2]  # WRONG ðŸ˜¦

KeyError: 2

After all, the `DataFrame` is a more complex structure than the `list` â€“ the above reference to the index value `2` was treated as a [reference to a column](../2/Accessing_Columns.html#extracting-features)!

Instead, `DataFrame` offers the properties `iloc` and `loc`, which may themselves be queried with a syntax based on the syntax for retrieving elements from a `list`.

### By position

#### Individuals

`iloc` is intended for *integer-location* based look-up of elements by their position in the index.

And, to start, we can now extract the third individual in our `DataFrame`, at the index offset or position `2`, with `iloc`.

In [18]:
planets.iloc[2]

name                    Earth
solar_distance_km_6     149.6
mass_kg_24               5.97
density_kg_m3          5514.0
gravity_m_s2              9.8
Name: 2, dtype: object

Much better.

And, hey â€“ it's Earth: the third planet from the sun.

Since `iloc` deals in integer offsets â€“ rather than _values_ within the index, we can do _the same_ with our alternative indices.

In [19]:
planets_natural.iloc[2]

name                    Earth
solar_distance_km_6     149.6
mass_kg_24               5.97
density_kg_m3          5514.0
gravity_m_s2              9.8
Name: 3, dtype: object

In [20]:
earth = planets_named.iloc[2]

earth

solar_distance_km_6     149.60
mass_kg_24                5.97
density_kg_m3          5514.00
gravity_m_s2              9.80
Name: Earth, dtype: float64

The above presentation of data might look familiar â€“ because it is, again, a `Series`.

In [21]:
type(earth)

pandas.core.series.Series

And we can still do all the things we did before with the `Series`.

But careful! What does it mean to take the median â€¦ of Earth?

In [22]:
earth.median()

79.7

Answer: nothing!

pandas is happy to apply formulas to series of data. But we'll have to be a bit more clever than that to come up with a _meaningful_ statistic based on these diverse features.

We _can_ meaningfully access features from the `Series` of our individual's data.

In [23]:
earth.solar_distance_km_6

149.6

Above, we've extracted just the Earth's distance from the Sun.

And we can see again how spread out the planets are.

The Earth is the third of eight planets, yet its distance from the sun is less than 12% their average.

In [24]:
earth.solar_distance_km_6 / planets.solar_distance_km_6.mean()

0.11822231880908397

#### Slices & selection

We can also reproduce our slice, more explicitly now, using `iloc`.

In [25]:
planets.iloc[:3]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Venus,108.2,4.87,5243.0,8.9
2,Earth,149.6,5.97,5514.0,9.8


We can also do something new â€“ construct a new `DataFrame` consisting of only the individuals at the specified offsets.

In [26]:
planets.iloc[[0, 7]]

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
7,Neptune,4495.1,102.0,1638.0,11.0


Above, we've specified to our `iloc`-based look-up a `list` â€“ `[0, 7]` â€“ indicating that we are interested in selecting out the rows at those offsets.

Again, we can perform this operation on our alternative indices, regardless of the different values within them.

In [27]:
planets_natural.iloc[[0, 7]]

Unnamed: 0_level_0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,Mercury,57.9,0.33,5427.0,3.7
8,Neptune,4495.1,102.0,1638.0,11.0


In [28]:
planets_named.iloc[[0, 7]]

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mercury,57.9,0.33,5427.0,3.7
Neptune,4495.1,102.0,1638.0,11.0


Note that in the new `DataFrame`, the planets' row index values have been preserved. This is highly useful â€“ indeed, Neptune is still the same planet as it was before. But `iloc` is **strictly** intended for offsets, like in a `list`.

If we were to repeat our selection of `[0, 7]` on the above, this would fail. Rather, the offset references for these two planets in the new `DataFrame` are now given by `[0, 1]`. According to `iloc` in this new `DataFrame`, Neptune will now be available at offset `1`.

In [29]:
bookends = planets.iloc[[0, 7]]

bookends.iloc[1]

name                   Neptune
solar_distance_km_6     4495.1
mass_kg_24               102.0
density_kg_m3           1638.0
gravity_m_s2              11.0
Name: 7, dtype: object

#### Resetting the index

If we wanted to correct the inconsistency produced above between our rows' positions in the index and the values of our index â€“ that is, to create a `DataFrame` reflecting a solar system consisting of _only_ Mercury and Neptune â€“ of course we could.

The `reset_index` method will recreate our `DataFrame` with an index regenerated for the elements it contains.

By default, `reset_index` preserves the data of our old index â€“ just in case â€“ as an extra column.

In [30]:
bookends.reset_index()

Unnamed: 0,index,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,0,Mercury,57.9,0.33,5427.0,3.7
1,7,Neptune,4495.1,102.0,1638.0,11.0


But we can tell pandas not to worry about that old index.

In [31]:
bookends.reset_index(drop=True)

Unnamed: 0,name,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
0,Mercury,57.9,0.33,5427.0,3.7
1,Neptune,4495.1,102.0,1638.0,11.0


### By label

The `loc` property, in the other hand, allows us to look up rows (and more!) according to their index _value_ or "label."

We may repeat the above operations we performed with `iloc`, but the arguments we supply to `loc` must reflect the index of the `DataFrame`.

We can still find Earth at the index value `2` in our simple `DataFrame`, because its index's values are integer positions.

In [32]:
planets.loc[2]

name                    Earth
solar_distance_km_6     149.6
mass_kg_24               5.97
density_kg_m3          5514.0
gravity_m_s2              9.8
Name: 2, dtype: object

In our "natural" numeric index, however, Earth is listed under the value `3`.

In [33]:
planets_natural.loc[3]

name                    Earth
solar_distance_km_6     149.6
mass_kg_24               5.97
density_kg_m3          5514.0
gravity_m_s2              9.8
Name: 3, dtype: object

And in our name-based index, we _can't_ supply integer values to `loc` at all!

In [34]:
planets_named.loc[3]  # WRONG ðŸ˜¦

KeyError: 3

Rather, we may supply `loc` the string value under which the individual is stored in that index.

In [35]:
planets_named.loc['Earth']

solar_distance_km_6     149.60
mass_kg_24                5.97
density_kg_m3          5514.00
gravity_m_s2              9.80
Name: Earth, dtype: float64

This even goes for slices as well!

In [36]:
planets_named.loc['Venus':'Mars']

Unnamed: 0_level_0,solar_distance_km_6,mass_kg_24,density_kg_m3,gravity_m_s2
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Venus,108.2,4.87,5243.0,8.9
Earth,149.6,5.97,5514.0,9.8
Mars,227.9,0.642,3933.0,3.7


:::{attention}
And, note: `loc` handled the above slice differently than `iloc`!

The upper bound â€“ as well as the lower bound â€“ of the range were **included** in the result.
:::

In the following sections, we'll continue to explore slicing and selecting elements by index position and value, via `loc` and `iloc`.