Geovisualization with Open Information

By Dr. Juan Camilo Orduz, Mathematician & Information Scientist

On this publish I need to present find out how to use public out there (open) knowledge to create geo visualizations in python. Maps are an effective way to speak and evaluate data when working with geolocation knowledge. There are numerous frameworks to plot maps, right here I deal with matplotlib and geopandas (and provides a glimpse of mplleaflet).

Reference: An excellent introduction to matplotlib is the chapter on Visualization with Matplotlib from the Python Data Science Handbook by Jake VanderPlas.

Comment: After I completed scripting this pocket book, I found an identical evaluation completed in R here. Please test it out!

 

Put together Pocket book

 

import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.fashion.use('seaborn')

%matplotlib inline

 

Get Germany Information

 
The primary knowledge supply for this publish is www.suche-postleitzahl.org/downloads. Right here we download three knowledge units:

  • plz-gebiete.shp: shapefile with germany postal codes polygons.
  • zuordnung_plz_ort.csv: postal code to metropolis and bundesland mapping.
  • plz_einwohner.csv: inhabitants is assigned to every postal code space.

 

Germany Maps

 
We start by producing a Germany map with crucial cities.

# Ensure you learn postal codes as strings, in any other case 
# the postal code zero1110 can be parsed because the quantity 1110. 
plz_shape_df = gpd.read_file('../Information/plz-gebiete.shp', dtype=)

plz_shape_df.head()
plz word geometry
zero 52538 52538 Gangelt, Selfkant POLYGON ((5.86632 51.05110, 5.86692 51.05124, …
1 47559 47559 Kranenburg POLYGON ((5.94504 51.82354, 5.94580 51.82409, …
2 52525 52525 Waldfeucht, Heinsberg POLYGON ((5.96811 51.05556, 5.96951 51.05660, …
3 52074 52074 Aachen POLYGON ((5.97486 50.79804, 5.97495 50.79809, …
4 52531 52531 Übach-Palenberg POLYGON ((6.01507 50.94788, 6.03854 50.93561, …

 

The geometry column incorporates the polygons which outline the postal code’s form.

We are able to use geopandas mapping tools to generate the map with the plot methodology.

plt.rcParams['figure.figsize'] = [16, 11]

# Get lat and lng of Germany's foremost cities. 
top_cities = 

fig, ax = plt.subplots()

plz_shape_df.plot(ax=ax, coloration='orange', alpha=zero.8)

# Plot cities. 
for c in top_cities.keys():
    # Plot metropolis identify.
    ax.textual content(
        x=top_cities[c][0], 
        # Add small shift to keep away from overlap with level.
        y=top_cities[c][1] + zero.08, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )
    # Plot metropolis location centroid.
    ax.plot(
        top_cities[c][0], 
        top_cities[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )

ax.set(
    title='Germany', 
    facet=1.3, 
    facecolor='lightblue'
);

png

 

First-Digit-Postalcodes Areas

 
Subsequent, allow us to plot totally different areas comparable to the primary digit of every postal code.

# Create characteristic.
plz_shape_df = plz_shape_df 
    .assign(first_dig_plz = lambda x: x['plz'].str.slice(begin=zero, cease=1))
fig, ax = plt.subplots()

plz_shape_df.plot(
    ax=ax, 
    column='first_dig_plz', 
    categorical=True, 
    legend=True, 
    legend_kwds=,
    cmap='tab20',
    alpha=zero.9
)

for c in top_cities.keys():

    ax.textual content(
        x=top_cities[c][0], 
        y=top_cities[c][1] + zero.08, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )

    ax.plot(
        top_cities[c][0], 
        top_cities[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )

ax.set(
    title='Germany First-Digit-Postal Codes Areas', 
    facet=1.3,
    facecolor='white'
);

png

 

Bundesland Map

 
Allow us to now map every postal code to the corresponding area:

plz_region_df = pd.read_csv(
    '../Information/zuordnung_plz_ort.csv', 
    sep=',', 
    dtype=
)

plz_region_df.drop('osm_id', axis=1, inplace=True)

plz_region_df.head()
ort plz bundesland
zero Aach 78267 Baden-Württemberg
1 Aach 54298 Rheinland-Pfalz
2 Aachen 52062 Nordrhein-Westfalen
3 Aachen 52064 Nordrhein-Westfalen
4 Aachen 52066 Nordrhein-Westfalen

 

# Merge knowledge.
germany_df = pd.merge(
    left=plz_shape_df, 
    proper=plz_region_df, 
    on='plz',
    how='internal'
)

germany_df.drop(['note'], axis=1, inplace=True)

germany_df.head()
plz geometry first_dig_plz ort bundesland
zero 52538 POLYGON ((5.86632 51.05110, 5.86692 51.05124, … 5 Gangelt Nordrhein-Westfalen
1 52538 POLYGON ((5.86632 51.05110, 5.86692 51.05124, … 5 Selfkant Nordrhein-Westfalen
2 47559 POLYGON ((5.94504 51.82354, 5.94580 51.82409, … 4 Kranenburg Nordrhein-Westfalen
3 52525 POLYGON ((5.96811 51.05556, 5.96951 51.05660, … 5 Heinsberg Nordrhein-Westfalen
4 52525 POLYGON ((5.96811 51.05556, 5.96951 51.05660, … 5 Waldfeucht Nordrhein-Westfalen

 

Generate Bundesland map:

fig, ax = plt.subplots()

germany_df.plot(
    ax=ax, 
    column='bundesland', 
    categorical=True, 
    legend=True, 
    legend_kwds=,
    cmap='tab20',
    alpha=zero.9
)

for c in top_cities.keys():

    ax.textual content(
        x=top_cities[c][0], 
        y=top_cities[c][1] + zero.08, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )

    ax.plot(
        top_cities[c][0], 
        top_cities[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )

ax.set(
    title='Germany - Bundesländer', 
    facet=1.3, 
    facecolor='white'
);

png

 

Variety of Inhabitants

 
Now we embody the variety of inhabitants per postal code:

plz_einwohner_df = pd.read_csv(
    '../Information/plz_einwohner.csv', 
    sep=',', 
    dtype='plz': str, 'einwohner': int
)

plz_einwohner_df.head()
plz einwohner
zero 01067 11957
1 01069 25491
2 01097 14811
3 01099 28021
4 01108 5876

 

# Merge knowledge.
germany_df = pd.merge(
    left=germany_df, 
    proper=plz_einwohner_df, 
    on='plz',
    how='left'
)

germany_df.head()
plz geometry first_dig_plz ort bundesland einwohner
zero 52538 POLYGON ((5.86632 51.05110, 5.86692 51.05124, … 5 Gangelt Nordrhein-Westfalen 21390
1 52538 POLYGON ((5.86632 51.05110, 5.86692 51.05124, … 5 Selfkant Nordrhein-Westfalen 21390
2 47559 POLYGON ((5.94504 51.82354, 5.94580 51.82409, … 4 Kranenburg Nordrhein-Westfalen 10220
3 52525 POLYGON ((5.96811 51.05556, 5.96951 51.05660, … 5 Heinsberg Nordrhein-Westfalen 49737
4 52525 POLYGON ((5.96811 51.05556, 5.96951 51.05660, … 5 Waldfeucht Nordrhein-Westfalen 49737

 

Generate map:

fig, ax = plt.subplots()

germany_df.plot(
    ax=ax, 
    column='einwohner', 
    categorical=False, 
    legend=True, 
    cmap='autumn_r',
    alpha=zero.8
)

for c in top_cities.keys():

    ax.textual content(
        x=top_cities[c][0], 
        y=top_cities[c][1] + zero.08, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )

    ax.plot(
        top_cities[c][0], 
        top_cities[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )
    
ax.set(
    title='Germany: Variety of Inhabitants per Postal Code', 
    facet=1.3, 
    facecolor='lightblue'
);

png

 

Metropolis Maps

 
We are able to now filter for cities utilizing the ort characteristic.

munich_df = germany_df.question('ort == "München"')

fig, ax = plt.subplots()

munich_df.plot(
    ax=ax, 
    column='einwohner', 
    categorical=False, 
    legend=True, 
    cmap='autumn_r',
)

ax.set(
    title='Munich: Variety of Inhabitants per Postal Code', 
    facet=1.3, 
    facecolor='lightblue'
);

png

berlin_df = germany_df.question('ort == "Berlin"')

fig, ax = plt.subplots()

berlin_df.plot(
    ax=ax, 
    column='einwohner', 
    categorical=False, 
    legend=True, 
    cmap='autumn_r',
)

ax.set(
    title='Berlin: Variety of Inhabitants per Postal Code', 
    facet=1.3,
    facecolor='lightblue'
);

png
 

 

Berlin

 
We are able to use the portal https://www.statistik-berlin-brandenburg.de to get the official postal code to space mapping in Berlin here. After some formating (not structured uncooked knowledge):

berlin_plz_area_df = pd.read_excel(
    '../Information/ZuordnungderBezirkezuPostleitzahlen.xls', 
    sheet_name='plz_bez_tidy',
    dtype=
)

berlin_plz_area_df.head()
plz space
zero 10115 Mitte
1 10117 Mitte
2 10119 Mitte
3 10178 Mitte
4 10179 Mitte

 

Word nonetheless that this map shouldn’t be one-to-one, i.e. a postal code can correspond to many areas:

berlin_plz_area_df 
    [berlin_plz_area_df['plz'].duplicated(maintain=False)] 
    .sort_values('plz')
plz space
2 10119 Mitte
41 10119 Pankow
4 10179 Mitte
26 10179 Friedrichshain-Kreuzberg
42 10247 Pankow
133 14197 Steglitz-Zehlendorf
95 14197 Charlottenburg-Wilmersdorf
165 14197 Tempelhof-Schöneberg
134 14199 Steglitz-Zehlendorf
96 14199 Charlottenburg-Wilmersdorf

 

99 rows × 2 columns

Therefore, we have to change the postal code grouping variable.

 

Berlin Neighbourhoods

 
Thankfully, the web site http://insideairbnb.com/get-the-data.html, containing AirBnB knowledge for a lot of cities (which is unquestionably price investigatinig!), has a convinient knowledge set neighbourhoods.geojson which maps Berlin’s space to neighbourhoods:

berlin_neighbourhoods_df = gpd.read_file('../Information/neighbourhoods.geojson')

berlin_neighbourhoods_df = berlin_neighbourhoods_df 
    [~ berlin_neighbourhoods_df['neighbourhood_group'].isnull()]

berlin_neighbourhoods_df.head()
neighbourhood neighbourhood_group geometry
zero Blankenfelde/Niederschönhausen Pankow MULTIPOLYGON (((13.41191 52.61487, 13.41183 52…
1 Helmholtzplatz Pankow MULTIPOLYGON (((13.41405 52.54929, 13.41422 52…
2 Wiesbadener Straße Charlottenburg-Wilm. MULTIPOLYGON (((13.30748 52.46788, 13.30743 52…
3 Schmöckwitz/Karolinenhof/Rauchfangswerder Treptow – Köpenick MULTIPOLYGON (((13.70973 52.39630, 13.70926 52…
4 Müggelheim Treptow – Köpenick MULTIPOLYGON (((13.73762 52.40850, 13.73773 52…

 

fig, ax = plt.subplots()

berlin_df.plot(
    ax=ax, 
    alpha=zero.2
)

berlin_neighbourhoods_df.plot(
    ax=ax, 
    column='neighbourhood_group',
    categorical=True, 
    legend=True, 
    legend_kwds=,
    cmap='tab20', 
    edgecolor='black'
)

ax.set(
    title='Berlin Neighbourhoods', 
    facet=1.3
);

png
Right here the divisions correspond to Neighbourhood ⊂⊂ Neighbourhood Group.

 

Chosen Places in Berlin

 
Typically it’s helpful to incorporate well-known areas on the maps in order that the consumer can determine them and perceive the distances and scales. One solution to do it’s to manualy enter the latitude and longitude of every level (as above). This after all might be time consuming and liable to errors. As anticipated, there’s a library which may fetch such a knowledge robotically, specifically geopy.

Right here is an easy instance:

from geopy import Nominatim

locator = Nominatim(user_agent='myGeocoder')

location = locator.geocode('Humboldt Universität zu Berlin')

print(location)

Humboldt-Universität zu Berlin, Dorotheenstraße, Spandauer Vorstadt, Mitte, Berlin, 10117, Deutschland

Allow us to write a operate to get the latitude and longitude coordinates:

def lat_lng_from_string_loc(x):
    
    locator = Nominatim(user_agent='myGeocoder')

    location = locator.geocode(x)
    
    if location is None:
        None
    else:
        return location.longitude, location.latitude
# Outline some well-known Berlin areas.
berlin_locations = [
    'Alexander Platz', 
    'Zoo Berlin', 
    'Berlin Tegel', 
    'Berlin Schönefeld',
    'Berlin Adlershof',
    'Olympia Stadium Berlin',
    'Berlin Südkreuz', 
    'Frei Universität Berlin',
    'Mauerpark', 
    'Treptower Park',
]

# Get geodata.
berlin_locations_geo = 

# Take away None.
berlin_locations_geo = 

Allow us to see the ensuing map:

berlin_df = germany_df.question('ort == "Berlin"')

fig, ax = plt.subplots()

berlin_df.plot(
    ax=ax, 
    coloration='orange', 
    alpha=zero.8
)

for c in berlin_locations_geo.keys():

    ax.textual content(
        x=berlin_locations_geo[c][0], 
        y=berlin_locations_geo[c][1] + zero.005, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )

    ax.plot(
        berlin_locations_geo[c][0], 
        berlin_locations_geo[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )

ax.set(
    title='Berlin - Some Related Places', 
    facet=1.3,
    facecolor='lightblue'
);

png

 

Christmas Markets

 
Allow us to enrich the maps by together with different kind of knowledge. There’s a nice useful resource for publicly out there knowledge for Berlin: Berlin Open Data. Amongst many fascinating datasets, I discovered one on the Christmas markets across the metropolis (that are actually enjoyable to go to!) here. You possibly can accces the information by way of a public API. Allow us to use the requests module to do that:

import requests

# GET request.
response = requests.get(
    'https://www.berlin.de/sen/internet/service/maerkte-feste/weihnachtsmaerkte/index.php/index/all.json?q='
)

response_json = response.json()

Convert to pandas dataframe.

berlin_maerkte_raw_df = pd.DataFrame(response_json['index'])

We do not need a postal code characteristic, however we are able to create one by extracting it from the plz_ort column.

berlin_maerkte_df = berlin_maerkte_raw_df[['name', 'bezirk', 'plz_ort', 'lat', 'lng']]

berlin_maerkte_df = berlin_maerkte_df 
    .question('lat != ""') 
    .assign(plz = lambda x: x['plz_ort'].str.break up(' ').apply(lambda x: x[0]).astype(str)) 
    .drop('plz_ort', axis=1)

# Convert to drift.
berlin_maerkte_df['lat'] = berlin_maerkte_df['lat'].str.substitute(',', '.').astype(float)
berlin_maerkte_df['lng'] = berlin_maerkte_df['lng'].str.substitute(',', '.').astype(float)

berlin_maerkte_df.head()
identify bezirk lat lng plz
zero Weihnachtsmarkt vor dem Schloss Charlottenburg Charlottenburg-Wilmersdorf 52.519951 13.295946 14059
1
  1. Weihnachtsmarkt an der Gedächtniskirche
Charlottenburg-Wilmersdorf 52.504886 13.335511 10789
2 Weihnachtsmarkt in der Fußgängerzone Wilmersdo… Charlottenburg-Wilmersdorf 52.509313 13.305994 10627
3 Weihnachten in Westend Charlottenburg-Wilmersdorf 52.512538 13.259213 14052
4 Weihnachtsmarkt Berlin-Grunewald des Johannisc… Charlottenburg-Wilmersdorf 52.488350 13.277250 14193

 

Allow us to plot the Christmas markets areas:

fig, ax = plt.subplots()

berlin_df.plot(ax=ax, coloration= 'inexperienced', alpha=zero.4)

for c in berlin_locations_geo.keys():

    ax.textual content(
        x=berlin_locations_geo[c][0], 
        y=berlin_locations_geo[c][1] + zero.005, 
        s=c, 
        fontsize=12,
        ha='middle', 
    )

    ax.plot(
        berlin_locations_geo[c][0], 
        berlin_locations_geo[c][1], 
        marker='o',
        c='black', 
        alpha=zero.5
    )

berlin_maerkte_df.plot(
    sort='scatter', 
    x='lng', 
    y='lat', 
    c='r', 
    marker='*',
    s=50,
    label='Christmas Market',  
    ax=ax
)

ax.set(
    title='Berlin Christmas Markets (2019)', 
    facet=1.3, 
    facecolor='white'
);

png

 

Interactive Maps

 
We are able to use the mplleaflet library which converts a matplotlib plot right into a webpage containing a pannable, zoomable Leaflet map.

import mplleaflet

fig, ax = plt.subplots()

berlin_df.plot(
    ax=ax, 
    alpha=zero.2
)

berlin_neighbourhoods_df.plot(
    ax=ax, 
    column='neighbourhood_group',
    categorical=True, 
    cmap='tab20',
)

mplleaflet.show(fig=fig)

fig, ax = plt.subplots()

berlin_df.plot(ax=ax, coloration= 'inexperienced', alpha=zero.4)

berlin_maerkte_df.plot(
    sort='scatter', 
    x='lng', 
    y='lat', 
    c='r', 
    marker='*',
    s=30,
    ax=ax
)

mplleaflet.show(fig=fig)

I hope these knowledge sources and code snippets function a place to begin to discover geodata evaluation and visualization with python.

 
Bio: Dr. Juan Camilo Orduz (@juanitorduz) in a Berlin based mostly mathematician & knowledge scientist.

Original. Reposted with permission.

Associated:

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *