This tutorial provides a step-by-step description of how to compute curvature radius on each segment of a geographical path, in this case: the path of a metro line. We will use the most common Python data analysis libraries, namely:
- Pandas for data processing,
- Numpy for scientific calculations,
- Branca and Folium for map visualisation.
Extracting the geographical coordinates
Open Street Map
Location data represent the geographical positions of an object. These positions, also called geographical coordinates, can be expressed as a couple (latitude, longitude) or a triplet (latitude, longitude and altitude). The acquisition of these data can be performed by direct measurements or by using proprietary (e.g. Google Maps) or free solutions. In this tutorial we will use a free solution: Open Street Map (OSM). OSM is a collaborative project to create an open database of global geographical data, fed by volunteers. These data are published under an open data license (Open Data Commons Open Database Licence, ODbL), and are therefore both free and in open access, unlike most proprietary solutions.
API Overpass
We will work on geographical data of Lyon's D Line metro. We can retrieve its path via the Overpass API, which allows querying OSM servers directly, with a dedicated query language. For the purpose of this tutorial, we will simply download the metro line path directly thanks to its OSM id. The Lyon metro's D Line, in the « Vaise Station ⇒ Vénissieux Station » direction, has the unique id 113903 in OSM. We will let the reader create a more elaborate Overpass query to find this id. A Python Overpass library, overpass, is available in the PyPI repositories. After having installed it (pip install overpass
), one obtains the path of the Lyon metro's D Line in GeoJSON format using code snippet:
import overpass
overpass_api = overpass.API()
osm_data = overpass_api.get(
'rel(113903); way(r);',
responseformat = "geojson",
verbosity = "geom"
)
The result is a dictionary with two main keys:
- type, which represents the type of the features collection (FeatureCollection) ;
- features, which contains the list of sections of the metro line, as a JSON object. In this tutorial, we are only interested in the points defining the line path, not the inter-stations sections. Therefore we retrieve the list of coordinates of each section and flatten them using a "list comprehension", a specific Python syntax, from which we create a Pandas DataFrame in order to simplify their treatment:
import pandas as pd
df_coordinates = pd.DataFrame(
[
point
for feature in osm_data["features"]
for point in feature["geometry"]["coordinates"]
],
columns = ["longitude", "latitude"]
)
Note that although this treatment is sufficient (as of March 2022) in the case of the Lyon metro’s D Line, for other cases some additional filtering and sections reordering may be necessary. A visual check of the map (with connected consecutive points) is strongly advised. We delete duplicates in order to prevent null distances later on, which would generate computing issues for the calculation of the curvature radius.
df_coordinates = df_coordinates.drop_duplicates().reset_index(drop=True)
We now have a list of geographical coordinates describing the Lyon metro's D Line:
df_coordinates
longitude | latitude | |
---|---|---|
0 | 4.807661 | 45.784029 |
1 | 4.804539 | 45.780564 |
2 | 4.803526 | 45.779504 |
3 | 4.803317 | 45.779159 |
4 | 4.803182 | 45.778785 |
... | ... | |
160 | 4.887721 | 45.707365 |
161 | 4.887954 | 45.706774 |
162 | 4.888014 | 45.706293 |
163 | 4.888020 | 45.705464 |
164 | 4.888006 | 45.703602 |
165 rows × 2 columns
Computation of the curvature radius
We want to estimate the curvature radius on any point of the metro path. Since we only have a series of points on the path, we will interpolate the curvature radius on each segment from the curvature radius on the segment bounds. This will allow us to take into account curvature inversions. Finding the curvature radius of a discrete trajectory is far from a trivial problem. The curvature radius on each point of a parameterized trajectory is defined as the radius of the osculating circle, but this definition cannot be used in the case of a discrete path. If our trajectory points were evenly spaced, we could take as definition of the osculating circle the tangent circle at the middle of consecutive segments, which may be one of the most accurate approaches. However this is not the case here: the distance between points varies to a great degree. An alternative method is to approximate the curvature radius by the radius of the circumcircle linking each triplet of points. The circumcircle of the triangle formed by these three points indeed tends towards the osculating circle when the distance between points is small. This method however suffers from a noticeable issue: the validity of the approximation decreases with the angle between the segments. Other approaches exist, but this one seems sufficient at the first order for our use case, since a metro line shouldn't have sharp turns. We will anyway add a warning to our curvature radius calculation to ensure our approximation stays reasonable. Note that this tutorial being didactic by nature, we prefer to slice it into distinct steps, even if it results in creating superfluous columns in our DataFrame. The algorithm proceeds in six steps:
- reconstruction of the line in sequences of three consecutive points,
- computation of the distance between the three points,
- computation of the curvature radius on each point,
- computation of the angle and direction of the curvature,
- management of the path boundaries,
- computation of the curvature radius on each segment.
Reconstruction of the line in sequences of three consecutive points
Starting from the dataframe of our metro line geolocation data, which is a sequence of points defined as latitude and longitude pairs: Each sequence of three consecutive points forms a triangle, of which two sides are segments of the metro line. When the angle between these two segments is large enough, the circumcircle of this triangle is a reasonable approximation of the osculating circle, and therefore provides an approximation of the curvature radius at the vertex between these two segments. We reconstruct our metro line data in order to include the previous and next points on each row of our dataframe. That is, having a sequence of points named after the alphabet:
- the first row will contain the geolocation data of A, B & C ;
- the second row will contain the geolocation data for B, C & D ;
- the third row will contain the geolocation data for C, D & E ;
- etc.
For large enough angles, circumcircles of triangles formed by consecutive points of a trajectory are a reasonable approximation of the curvature radius. Here, the blue circle is circumscribed to the CDE triangle.
We can use the shift()
Pandas method to add the coordinates of the previous and next points to the current row:
df_coordinates[["long_prev","lat_prev"]] = df_coordinates[["longitude","latitude"]].shift()
df_coordinates[["long_next","lat_next"]] = df_coordinates[["longitude","latitude"]].shift(-1)
We obtain the following dataset:
df_coordinates
longitude | latitude | long_prev | lat_prev | long_next | lat_next | |
---|---|---|---|---|---|---|
0 | 4.807661 | 45.784029 | NaN | NaN | 4.804539 | 45.780564 |
1 | 4.804539 | 45.780564 | 4.807661 | 45.784029 | 4.803526 | 45.779504 |
2 | 4.803526 | 45.779504 | 4.804539 | 45.780564 | 4.803317 | 45.779159 |
3 | 4.803317 | 45.779159 | 4.803526 | 45.779504 | 4.803182 | 45.778785 |
4 | 4.803182 | 45.778785 | 4.803317 | 45.779159 | 4.803136 | 45.778385 |
... | ... | ... | ... | ... | ... | |
160 | 4.887721 | 45.707365 | 4.887301 | 45.708187 | 4.887954 | 45.706774 |
161 | 4.887954 | 45.706774 | 4.887721 | 45.707365 | 4.888014 | 45.706293 |
162 | 4.888014 | 45.706293 | 4.887954 | 45.706774 | 4.888020 | 45.705464 |
163 | 4.888020 | 45.705464 | 4.888014 | 45.706293 | 4.888006 | 45.703602 |
164 | 4.888006 | 45.703602 | 4.888020 | 45.705464 | NaN | NaN |
165 rows × 6 columns
By design, it is not possible to get a full triplet for the first and last points of our path. We will see later how we can treat these particular cases.
Computation of the distance between the three points
Since we know the latitude and longitude of each point, we can easily compute the length of each triangle's sides. At the scale of a metro line, the earth's own curvature has less impact than the local altitude variations, so we could just compute the euclidean distance between each points. However, OSM's geographical coordinates being represented using the WGS 84 system, it is both more practical and more accurate to use the haversine formula to estimate the great circle distance between two points. In principle, this formula is only valid for a spherical surface, while the WGS 84 system uses as a reference an ellipsoidal surface, but in our case the differences are negligible. Below is an implementation of the haversine formula:
import numpy as np
def haversine_dist(lat1, lng1, lat2, lng2):
EARTH_RADIUS = 6371e3 # in meters
lat1, lng1, lat2, lng2 = map(np.radians, [lat1, lng1, lat2, lng2]) # degrees to radians
return 2 * EARTH_RADIUS * np.arcsin(
np.sqrt( np.sin((lat2-lat1)/2) * 2
+ np.cos(lat1) * np.cos(lat2) * np.sin((lng2-lng1)/2) * 2
)
)
Let's apply this formula to our set of geographical coordinates:
df_coordinates["dist1"] = haversine_dist(
df_coordinates["long_prev"],
df_coordinates["lat_prev"],
df_coordinates["longitude"],
df_coordinates["latitude"]
)
df_coordinates["dist2"] = haversine_dist(
df_coordinates["longitude"],
df_coordinates["latitude"],
df_coordinates["long_next"],
df_coordinates["lat_next"]
)
df_coordinates["dist3"] = haversine_dist(
df_coordinates["long_next"],
df_coordinates["lat_next"],
df_coordinates["long_prev"],
df_coordinates["lat_prev"]
)
Computation of the curvature radius on each point
We define the curvature radius on any point of the path as the circumcircle of the triangle formed with the current point and the previous and next ones.
We take as the curvature radius on D the radius of the circumcircle of the CDE triangle.
The circumcircle radius of any triangle can be computed with:
def circum_circle_radius(a, b, c):
# better safe than sorry
for length in a, b, c:
if length <= 0:
return np.nan
try:
return (a * b * c) / (np.sqrt((a + b + c) * (b + c - a) * (c + a - b) * (a + b - c)))
except:
# what if a = b + c ? Division by zero…
return np.nan
Applying this formula on our dataset:
df_coordinates["circumradius"] = df_coordinates[["dist1", "dist2", "dist3"]].apply(
lambda x: circum_circle_radius(x["dist1"], x["dist2"], x["dist3"]),
axis = 1
)
We now have an estimate of the curvature radius on each known point of the path (except at bounds). We could consider it sufficient, but we'd prefer to know the curvature radius of each section.
Computation of the angle and direction of the curvature
We will later interpolate the curvature radius of a section from the curvature radius at its endpoints. However, on sections where the curvature direction changes (e.g. left turn followed by a right turn), the section is straight overall: the two curvature radius should compensate, not cumulate! Therefore, we need to determine the curvature direction. We calculate the signed angle between a section and the next one, which by the way allows us to set a warning for angles for which the circumcircle approximation of the osculating circle becomes unreasonable:
def compute_angle(r):
if np.isnan(r["long_prev"]) or np.isnan(r["long_next"]):
return np.nan
# large angle threshold, in degrees
WARN_THRESHOLD = 60
# three consecutive points => two vectors
v0 = [ r["longitude"] - r["long_prev"], r["latitude"] - r["lat_prev"] ]
v1 = [ r["long_next"] - r["longitude"], r["lat_next"] - r["latitude"] ]
angle = np.degrees(np.math.atan2(np.linalg.det([v0,v1]), np.dot(v0,v1)))
if np.abs(angle) > WARN_THRESHOLD:
print(f"WARNING: Large angle ({angle}°), curvature radius may be off at {r['longitude'],r['latitude']}.")
return angle if angle else 1e-6 # we need to multiply by the sign of the angle so it must not be null !
We apply this function on our coordinates:
df_coordinates["angle"] = df_coordinates[[ "longitude", "latitude", "long_prev", "lat_prev", "long_next", "lat_next" ]].apply(
compute_angle,
axis=1
)
Management of the path boundaries
We now have an estimate of the curvature radius on each point of our path, except on endpoints. It seems natural to consider a null curvature (and therefore an infinite curvature radius) on these points, since the metro line was presumably designed in order for the train to follow a straight line at departure and arrival. However, it will be useful for us to have a finite curvature radius so that we can compute a curvature radius on the first and last sections of the path. We arbitrarily choose a value of 100 km, which is both large enough to approximate a straight line at the scale of the metro line, and small enought to not completely cancel the contribution of the previous or next point:
df_coordinates.loc[df_coordinates.index[[0,-1]], "circumradius"] = 100000
As for the curvature direction, there is no reason to assume an inversion on the first or last section since the curvature at the endpoints is supposed to be null. Therefore we use for these points the same curvature direction as the one calculated for the adjacent point:
df_coordinates.iloc[0,df_coordinates.columns.get_loc("angle")] = df_coordinates.iloc[1]["angle"]
df_coordinates.iloc[-1,df_coordinates.columns.get_loc("angle")] = df_coordinates.iloc[-2]["angle"]
Computation of the curvature radius of each segment
It is common in civil engineering to consider the curvature rather than the curvature radius, the former being defined as the inverse of the latter. Using curvature allows a rather intuitive approach: a sharp turn has a large curvature, a straight line is characterised by a null curvature, and the average curvature simply corresponds to the arithmetic mean of the curvatures. As a consequence, the average curvature radius is given by the harmonic mean of the curvature radius. In order to take into account the curvature direction inversions, we multiply the curvature radius value on each point by the sign of the curvature direction. To keep things simple, instead of creating a second dataframe containing the list of segments with their curvature radius, we decide as a convention that the radius_curve
column will apply to the segment between the point of this line and the next one. Therefore, by design, we won't have a section curvature value for the last row of our DataFrame.
df_coordinates["curve_radius"] = np.abs(
2 * df_coordinates["circumradius"] * df_coordinates["circumradius"].shift(-1) / (
np.sign(df_coordinates["angle"]) * df_coordinates["circumradius"]
+ np.sign(df_coordinates["angle"]).shift(-1) * df_coordinates["circumradius"].shift(-1)
)
)
There we go, we have estimated the average curvature radius on each section of our metro line !
Visualisation
It is always interesting, when possible, to visually check the consistency of our analysis. The most appropriate visualisation here is obviously a geographical map. We first generate a color scale with the Branca library (red for sharp turns, green for straight lines):
import branca.colormap as cm
colormap = cm.LinearColormap(colors=['red','yellow','green'], vmin=0, vmax=1000)
Of course, the chosen scale depends on what we want to show: here we chose a scale highlighting the curves of the trajectory for the Lyon metro's D Line on a map, which is not the same as what we would use in order to identify critical points on the line, a use case for which we would have to define a criticality threshold. Finally, we create the map using the Folium library, showing the curvature radius on each known point of the trajectory and the curvature radius for each section
import folium
# basic map
metro_map = folium.Map( location = [45.75, 4.85], tiles = "Cartodb dark_matter", zoom_start = 12.5 )
# let's add circumradius on known points
_ = df_coordinates[["latitude","longitude","circumradius"]].dropna() \
.sort_values(by="circumradius", ascending=False) \
.apply(
lambda r:
folium.Circle(
location = [r["latitude"], r["longitude"]],
radius = 5,
fill = True,
color = colormap(np.abs(r["circumradius"]))
).add_child(
folium.Popup(f"Rayon de courbure: {np.abs(r['circumradius']):.1f} m")
).add_to(metro_map),
axis = 1
)
# then curvature radius on each segment
_ = df_coordinates[["latitude","longitude","lat_next","long_next","curve_radius"]].dropna() \
.sort_values(by="curve_radius", ascending=False) \
.apply(
lambda r:
folium.PolyLine(
locations = [ (r["latitude"], r["longitude"]), (r["lat_next"], r["long_next"]) ],
color = colormap(np.abs(r["curve_radius"]))
).add_child(
folium.Popup(f"Rayon de courbure: {np.abs(r['curve_radius']):.1f} m")
).add_to(metro_map),
axis = 1
)
Plotting the segments also helps to identify undesirable objects or misplaced points due to a wrong ordering.
We now have a nice map of the curvature along the Lyon metro line D. We can clean our DataFrame in order to remove the temporary columns:
df_coordinates = df_coordinates[["longitude","latitude","circumradius","curve_radius"]]
df_coordinates
longitude | latitude | circumradius | curve_radius | |
---|---|---|---|---|
0 | 4.807661 | 45.784029 | 100000.000000 | 20768.069019 |
1 | 4.804539 | 45.780564 | 11587.259539 | 995.315759 |
2 | 4.803526 | 45.779504 | 477.164288 | 304.742972 |
3 | 4.803317 | 45.779159 | 223.854196 | 206.157699 |
4 | 4.803182 | 45.778785 | 191.054164 | 137.202010 |
... | ... | ... | ... | |
160 | 4.887721 | 45.707365 | 890.339787 | 386.393866 |
161 | 4.887954 | 45.706774 | 246.736971 | 353.149906 |
162 | 4.888014 | 45.706293 | 620.956614 | 1169.760747 |
163 | 4.888020 | 45.705464 | 10067.161527 | 18292.761233 |
164 | 4.888006 | 45.703602 | 45.703602 | NaN |
165 rows × 4 columns
Possible improvements
The curvature radius approximation used in this tutorial was used for the prototyping of an internal solution. Many improvements are possible:
- we assigned a curvature radius to each segment connecting two consecutive points of the path, using the average of the curvature radius on its endpoints. In order to get a better resolution, one could split these segments and weigh this average so that the weight of the curvature radius of one bound increases as one gets closer to it ;
- the curvature radius on the endpoints were chosen arbitrarily, it should be refined ;
- we approximated the osculating circle by the circumcircle, there are probably more accurate approaches ;
- in particular, this approximation may not be valid for sharper curvatures and/or less precise trajectories ;
- OSM does not currently provide the elevation of the points on the Lyon metro's D Line — taking it into account should not change much for this line since it is entirely underground, but this could be different for other use cases, and with the elevation the curvature radius calculation becomes a 3-dimensional problem (although the circumcircle is in the plane of the three consecutive points, so the approximation should stand).
Discover our metro dataset and all our connected mobility datasets.