close
close

first Drop

Com TW NOw News 2024

Data Snack — Use Voronoi to Analyze Service Areas of Transit Stations in Tokyo
news

Data Snack — Use Voronoi to Analyze Service Areas of Transit Stations in Tokyo

Everything you need to know about Voronoi Diagrams: Analyze Service Areas of Transit Stations in Tokyo

Discover Data Science techniques while acquiring slightly interesting statistical insights

Data Snack — Use Voronoi to Analyze Service Areas of Transit Stations in TokyoData Science and Mass Transit: A dream team. (Image by author, illustrations by Takashi Mifune under free use)

With the world becoming increasingly urbanized (1), public transport has become an omnipresent part of urban life. The probably most urban place in the world is Tokyo (2) — a bustling Megapolis of an unmatched size where most people primarily rely on public transport (3) in their everyday lives.

This article introduces you to the concept of the Voronoi diagram in an urban planning context and uses it to divide the service area of train stations in Tokyo. We will use the acquired service areas to obtain various, maybe slightly interesting, statistics about the train station’s surroundings.

Introduction

A Voronoi diagram (Image by author)

Voronoi diagrams and Delaunay triangulations find wide application in many branches of science. (4) Voronoi diagrams, also known as Voronoi grids, are used to divide a flat surface into distinct areas that correspond to specific points.

This problem is arising frequently in many varieties. (5)

Some examples include:

  • Government of Melbourne (2024-), when they assign students to their nearest school (6)
  • John Snow (1813–1853) when he related the outbreak of cholera in London to the location of water pumps (4)
  • René Descartes (1596–1650) when he investigated the distribution of matter relative to fixed stars (4)

Today, Voronoi diagrams are being used in many areas, including computer science, geography, and especially urban planning. Urban planning is the field I would like to introduce to you in more detail — we will determine service areas of mass transit stations in world’s largest metropolis: Tokyo.

The Components of Voronoi

The components of a Voronoi Diagram (Image by author)

The Voronoi diagram consists out of multiple different result sets with distinct names and usage:

  • 🔵 Voronoi Site is the reference location the Voronoi Region is calculated for.
  • 🟣 Voronoi Region contains all the points on the surface closer to the related Voronoi site than any other.
  • 🟢 Voronoi Arc is the straight line segment that is a boundary between two Voronoi Regions (4)
  • 🟠 Voronoi Vertex is a point where Voronoi Arcs intersect.

The Distance Functions for Voronoi

Comparing various distance functions (Image by author)

The Voronoi diagram in a city planning context is usually based on a distance/reference system relationship. The distance is calculated based on certain measures, such as literal distance or travel time. There are multiple methods for calculating said distance:

Overview of Euclidean Distance (Image by author)

Euclidean Distance. Is the air-line distance between two points in a coordinate system. It is assuming an open space between two points with nothing blocking them in between. It is the most basic way to calculate distance.

Overview of Manhattan Distance (Image by author)

Manhattan distance. Ignores the air-line distance and introduces a distance function to approximate travel times in a city grid — just like in Manhattan. It is better suited for certain city environments.

Overview of Time-based Distance (Image by author)

Time-based distance. The most accurate measure, but also the one most complicated to acquire.

Additionally, time based distances may cause anomalies, as travel times between areas might not be linear to their distance and therefore could cause the Voronoi Regions to unevenly split up (10)— just like in the example below.

Fig. 5. Voronoi diagrams of selected areas for geographical (a), road (b), and travel time distances ©. The wide blue line is the Warta river, while the magenta blocks are locations of bridges (11)

Now, we have all the basics for starting with our real life use case — Tokyo’s public transport. Let’s get started!

Assessing the Tokyo Transit System

One of the most important aspects of urban planning are mass transit networks — especially in a place like Tokyo. Tokyo is a place that many people are longing for — especially public transport enthusiasts.

Hence, Tokyo provides us with the perfect example to use Voronoi diagrams with a real life use case.

Concept

Imagine a city map dotted with public transit stations. Each station serves its surrounding area, called a service zone. But how can we determine these zones?

To better understand how to achieve our goal of acquiring service areas of public transport stations, I put the Voronoi diagram into our mass transit perspective:

A concept on how to use Voronoi diagrams for acquiring service zones (Image by author)

  • 🔵 Voronoi site is now a train station in Tokyo
  • 🟢 Voronoi Arc is now the border dividing service areas of train stations
  • 🟣 Voronoi Region is now the area serviced by one specific train station based on our distance function

That’s what Voronoi Diagrams can offer us. By dividing the map into regions based on the distance to the nearest station, we have a very simple approach defining service areas of train stations.

Now, as the concept is clear, let’s get started with the actual implementation.

Identitifying Tokyo

We need to define the area we would like to create our Voronoi Diagram for. That process is usually referred to as defining the Bounding Box.

Difference between Tokyo City and Tokyo Prefecture (Image by author)

When someone says “Tokyo”, it can mean different things to different people.

Some may think of it as a:

1) Bustling city (Tokyo City) with towering skyscrapers and vibrant street culture, while others may view it as a

2) Prefecture (Tokyo Prefecture) with scenic landscapes and natural beauty, while someone else might refer to

3) Greater Tokyo Area, which is the most populous metropolitan area in the world (8), that consists out of multiple standalone cities (e.g. Tokyo, Saitama, Yokohama, ..).

However, it’s important to define what we mean by “Tokyo” in order to avoid confusion.

Using the prefecture definition allows us to differentiate Tokyo from its neighboring cities like Saitama, Chiba, and Yokohama while including most locations which people generally refer to as “in Tokyo”.

Tokyo Prefecture will be our base border box for all subsequent Voronoi calculations.

For acquiring geospatial information about Tokyo prefecture, we resort to official government sources, as referenced below:

Dataset: National Land Numerical Information | Administrative Boundaries Data (mlit.go.jp), Licensed under the Open Data Policy, allowing commercial use. 利用規約 (mlit.go.jp)

Locating Train Stations

Visually checking our dataset. I located Takadanobaba in it’s correct position. (Image by author)

The Japanese Government provides a reliable source for information about train stations. Their website offers a detailed dataset containing all train stations across the country, along with other related metadata.

It is worth noting that the dataset focuses solely on train stations as per Japan government’s definition and may exclude modes of mass transit like metro and monorail while including modes of transit people generally wouldn’t refer to as trains, such as specific cable cars.

There might be opportunities to increase the correctness of data retrieved through combining additional data sources. (9) In this article, I will continue with the governmental dataset without further enhancements.

Dataset: National Land Numerical Information | Railway Data (mlit.go.jp), Licensed under the Open Data Policy, allowing commercial use. 利用規約 (mlit.go.jp)

Obtaining Service Zones

Now that we have established the borders of Tokyo and a list of relevant train stations, we can proceed with calculating our service zones. We will be utilizing KNIME, a powerful tool for scientific computations that minimizes the need for in-depth math knowledge or coding, by abstracting much of the complexity away.

Open for Innovation | KNIME

1. Extract data

We start with connecting our data sources to the tool. Fortunately, KNIME provides a set of tools for Geospatial operations, which we can use out of the box.

We proceed by creating two nodes for importing our data. GeoFile Reader node is capable of handling both Shapefile and geojson data types.

Data import in KNIME (Image by author)

To use the two datasets, some preparation is necessary.

  1. We remove some fields due to an excess number within each dataset with the Column Filter node.
  2. To enhance readability, we rename certain columns within the datasets with the Column Renamer node.
  3. To avoid confusion later on, identical columns within each dataset are given unique names with the Column Renamer node.

Data extraction in KNIME (Image by author)

Having completed the data extraction and preparation, we can now move forward with our computations.

2. Work with data

Our next objective is to obtain the Voronoi polygons for each station, thereby enabling us to derive their respective service zones.

  1. We create a bounding box for our Voronoi diagram, using our Tokyo Prefecture dataset, with the Bounding Box node.
  2. In order to perform our calculations, we need points instead of polygon representations for the stations. The Geometry to Point node is used to convert them.
  3. We perform the Voronoi calculation using the Voronoi (Thiessen) Polygon node, which produces polygons and associated IDs. However, since we also require station metadata, we must join the Voronoi polygons with this information again using the Spatial Join node.

The entire workspace in KNIME (Image by author)

That’s everything we need to obtain station service zones in KNIME. Let’s take a look on the results.

Service Zones (V1)

Now we can see that the Voronoi diagram has divided our map, assigning a unique area to each station.

One station has multiple service areas — something went wrong. Visualized with QGIS (Image by author)

We should keep in mind that certain stations like Takadanobaba look like one station in person, but actually consist of multiple stations in reality. Because of this, we need to do some extra work to make sure our calculations reflect this accurately.

Cleaning Data and Service Zones (V2)

The station dataset contains an additional ID that groups stations by their public name and/or real-life appearance. By utilizing this ID and the Group By node, we can join the individual stations into a single one.

The entire workspace in KNIME (Image by author)

After consolidating the individual stations, we were able to create a more accurate dataset of service zones that better reflects how people see stations in the real world. Look at Takadanobaba — it’s located in a single Voronoi region now.

Cleaned up dataset: Now we have one service area per station, visualized with QGIS (Image by author)

Result

We have completed our calculations and obtained service zones that can provide useful insights and statistics.

To explore the results yourself, you can find everything in this GitHub repository:

GitHub – martinjurran/KNIME-Tokyo-StationServiceAreas: KNIME workflow for calculating service areas of Tokyo train stations and restaurant density analysis.

Next, we will obtain some real-life statistics that we can derive from our service zones.

Statistics Example — Restaurant Density

Finding restaurants isn’t hard in Japan — but where is their highest density? (Illustrations by Takashi Mifune under free use)

When it comes to planning a vacation, one of the biggest hassles is figuring out where to stay. I mean, you want to pick a location that is close to all the restaurants, shops and other cool stuff, right?

But with so many options out there, it can be confusing to find the perfect spot. That’s where our newly acquired Transit Station Service Areas can help us:

Goal: Identifying the station with the most POI in it’s surroundings. To make things easy, we will focus on restaurants.

Acquiring Data

Data valuable for business purposes seems to frequently be protected and hard to retrieve. In the case of restaurants, there is no official source available.

The most accurate sources, such as Business Registrations or Google Maps, are either associated with a steep price tag or just aren’t approved for large scale processing usage.

The Overpass API offered by the OpenStreetMap foundation, is one of the only sources that offers the data we need. With a simple query in Overpass Turbo, we can acquire all restaurants in Tokyo.

nwr(amenity=restaurant)(around:60000,{{center}});
out center;

The data is being displayed in Overpass Turbo right away and is available for export to a file type of our choice:

Overpass Turbo UI (Image by author)

We now have a full dataset on all restaurants in Tokyo. It’s coming with it’s limitations, as the data is crowd sourced, not validated and might also be centered around the most popular spots in the city, as that is where people usually contribute to on their platform. As it is the best data available in our case, we continue using it.

Dataset: overpass turbo (overpass-turbo.eu), Data licensed under Open Database License (ODbL)

Matching Points of Interest (POI) to their service zone

To calculate the individual amount of restaurants in a station’s service zone, we need to match the POI to their respective service zones.

I have imported the station service zones as a layer in QGIS. That application offers us to calculate node counts within areas out of the box.

Counting points inside polygons in QGIS (Image by author)

The number of restaurants in each service zone does not meet our requirements, as some large zones have many restaurants — but with long distances to cover in between them. Therefore, we need to develop a new metric to address this scenario. The easiest approach would be to determine the density of restaurants.

Formula for POI density

For a simple ranking, restaurants/km² per service zone might be a good representation. That way, we can find the service zone with the highest density of restaurants.

In some cases, service zones may be small yet have a large number of restaurants, which could inflate their score. However, in our situation, this is not a concern. Small service zones might indicate the presence of another station and more restaurants in close proximity.

There is a formula:

Formula for calculating POI density

where:

R = POI density factor in n/km²
A = area of polygon in km²
n = number of POI

We import the dataset with POI Count/service area into KNIME and run our formula for each and every service zone.

Calculating POI density in KNIME (Image by author)

Finally, we acquired the zones with the highest density of restaurants. Let’s take a look at the results.

Results

The top 20 station service areas with the highest density of restaurants are:

The top 25 stations sorted by restaurant density, visualized with Tableau Public (Image by author)

We can also view our results on a map to get more insights:

The top 20 stations sorted by restaurant density, visualized with Tableau Public (Image by author)

We can see, that the areas with the highest density are appearing in clusters. I did some more research and found out that Tokyo is made out of individual cities (e.g. Taito City, Shibuya City, Chiyoda). These clusters do represent the individual cities Tokyo is made out of in some way — interesting!

The high restaurant-density clusters identified (Image by author)

It’s important to note that our dataset is crowd-sourced and may not be entirely representative or complete, as it could be biased towards areas that have been particularly well-surveyed.

However, based on the data we have, Ueno-Okachimachi Station is the clear winner.

https://medium.com/media/f4adf31bebe7819dbeda4bdccbab609a/href

If you’re interested in exploring the data further, you can check out the Tableau Public page, where you can interact with the visualization and delve deeper into the results:

https://public.tableau.com/app/profile/martin.jurran/viz/Tokyo-RestaurantDensity/Map#1

Conclusion

The station service zone with the highest density. Ameya Yokocho is a part of the Ueno-okachimachi station service zone.

Voronoi diagrams are more versatile and useful than we often realize. They enable us to uncover insights, such as identifying Ueno-okachimachi station as having the highest restaurant density in Tokyo Prefecture.

Even major companies like Uber likely use Voronoi diagrams to efficiently assign drivers to pick-up locations. Their wide range of applications makes them valuable across various industries, especially since they can be computed with minimal resources.

I encourage you to explore the capabilities of Voronoi diagrams and see how they can benefit you. By including them into your toolset, you can enhance your data analysis skills and gain access to more insightful statistics.

Sources

(1) United Nations (2018, September 13), Urbanization, https://www.un.org/development/desa/pd/content/urbanization-0

(2) Demographia (2023, January 24), World Urban Areas 19th Annual, http://www.demographia.com/db-worldua.pdf

(3) Demographia (2003, January 1), Where Rail Transit Works, and Why, http://demographia.com/db-htld-rail.htm

(4) Vera Galishnikova, Peter Jan Pahl (2018, Mar 15), Constained Construction of Planar Delaunay Triangulations without flipping, https://www.researchgate.net/publication/325582898_Constrained_Construction_of_Planar_Delaunay_Triangulations_without_Flipping

(5) Liebling T.M., Pournin L. (2010), Voronoi Diagrams and Delaunay Triangulations: Ubiquitous Siamese Twins. Documenta Mathematica. Mathematics Subject Classification: 01A65, 49- 03, 52C99, 68R99, 70–08, 92–08

(6) Government of Melbourne (2024), School Catchment Map, https://www.findmyschool.vic.gov.au/

(7) Wikipedia (2024), Taxicab geometry, https://en.wikipedia.org/wiki/Taxicab_geometry

(8) Wikipedia (2024), Greater Tokyo Area, https://en.wikipedia.org/wiki/Greater_Tokyo_Area

(9) Public Transportation Open Data Center (2024), Dataset — 公共交通オープンデータセンター データカタログサイト,ttps://www.odpt.org/

(10) D.T. Lee, Chung-Shou Liao, Wei-Bung Wang (N/A), TIme-based Voronoi Diagram, http://alumni.cs.ucr.edu/~weiw/paper/VD_highways.pdf

(11) Solutions for Planning Smart Hybrid Public Transportation System — Poznan Agglomeration as a Case Study of Satellite Towns’ Connections — Scientific Figure on ResearchGate. https://www.researchgate.net/figure/Voronoi-diagrams-of-selected-areas-for-geographical-a-road-b-and-travel-time_fig5_336071639

Pictograms by かわいいフリー素材集 いらすとや (irasutoya.com), © Takashi Mifune

(Image by author, Illustrations by Takashi Mifune under free use)


Data Snack — Use Voronoi to Analyze Service Areas of Transit Stations in Tokyo was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.