This notebook is divided into two parts, each demonstrating the functionalities of the VEDA EOAPI.
Reading and visualizing one of the datasets from the VEDA data catalog.
Using the EOAPI to generate a time-series of OMI (Ozone Monitoring Instrument) NO₂ and SO₂ datasets
Author: Slesa Adhikari
1. Reading and visualizing one of the datasets from the VEDA data catalog¶
Import all the necesssary libraries
Make sure you install these first using:
pip install pystac_client folium seaborn pandas# imports
import requests
from pystac_client import Client
import folium
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pdDefine the API endpoints
The EOAPI is a combination of two components.
Data catalog - the SpatioTemporal Asset Catalog (STAC) specification is used to catalog the available datasets
Dynamic tile server - TiTiler is used to dynamically serve cloud optimized geotiff (raster) data files
STAC_API_URL = "https://staging-stac.delta-backend.com/"
RASTER_API_URL = "https://staging-raster.delta-backend.com"Use the pystac_client library to interact with the STAC data catalog
catalog = Client.open(STAC_API_URL)List all the datasets (collections) in the catalog
for collection in list(catalog.get_collections()):
print(f"{collection.id} - {collection.title}")Choose a collection to work with
Search all the items in the collection
collection_id = "no2-monthly"
search = catalog.search(collections=[collection_id])
items = list(search.items())
itemsLoad and inspect one of the items
s3_uri = items[0].assets["cog_default"].hrefstats = requests.get(
f"{RASTER_API_URL}/cog/statistics",
params={"url": s3_uri}
).json()
statsDisplay the COG in a map
Get the tiles endpoint for the file
rescale = f"{stats['b1']['min']},{stats['b1']['max']}"tiles = requests.get(
f"{RASTER_API_URL}/stac/tilejson.json?collection={collection_id}&item={items[0].id}&assets=cog_default&colormap_name=rdbu_r&rescale={rescale}"
).json()tilesUse the tiles to visualize the file in a map
m = folium.Map(
zoom_start=6,
scroll_wheel_zoom=True,
tiles=tiles["tiles"][0],
attr="VEDA",
minzoom=0,
maxzoom=18,
)mMake this map slightly more insightful
Using the minimum and maximum values for the colorscale may not be the most useful. Dynamic tiling to the rescue! TiTiler comes with a large amount of options to style the map. Here we’ll do a quick a dirty adjustment of the max value.
rescale_limited_max = f"{stats['b1']['min']},{stats['b1']['max']/2}"tiles_limited_max = requests.get(
f"{RASTER_API_URL}/stac/tilejson.json?collection={collection_id}&item={items[0].id}&assets=cog_default&colormap_name=rdbu_r&rescale={rescale_limited_max}"
).json()
tiles_limited_maxm_limited_max = folium.Map(
zoom_start=6,
scroll_wheel_zoom=True,
tiles=tiles_limited_max["tiles"][0],
attr="VEDA",
minzoom=0,
maxzoom=18,
)
m_limited_max2. Using the EOAPI to generate a time-series of OMI (Ozone Monitoring Instrument) NO₂ and SO₂ datasets¶
There’s a story published in the EODashboard with the following title:
Air pollution in India, China and the U.S. have changed significantly over the past two decades
Link: https://
The story talks about the trend of air pollution in India, China and the U.S. using the NO2 and SO2 readings grabbed from the OMI instrument
Here, we’ll recreate the analysis using the EOAPI
# Here, we find the relevant collection ID for the dataset
collections = {
"no2": "OMI_trno2-COG",
"so2": "OMSO2PCA-COG",
}Define the roughly similar Area of Interest (AOI) for each of the country as seen in the story
aois = {
"india": {
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
84.44227371801799,
25.276852788244952
],
[
81.73331688510166,
25.379576397063317
],
[
81.40290450746915,
20.640781701865322
],
[
84.09079123546121,
20.59296261766137
],
[
84.44227371801799,
25.276852788244952
]
]
],
"type": "Polygon"
}
},
"china": {
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
118.14487188674968,
40.38237805885373
],
[
112.59679754686567,
40.39197699341523
],
[
112.78712023622006,
32.015052150835814
],
[
117.937454307721,
32.102440507249895
],
[
118.14487188674968,
40.38237805885373
]
]
],
"type": "Polygon"
}
},
"usa": {
"type": "Feature",
"properties": {},
"geometry": {
"coordinates": [
[
[
-80.16702521343733,
41.73420113945659
],
[
-83.56446680395005,
38.599369254919566
],
[
-82.00280661075571,
37.54658260550103
],
[
-78.28140359718638,
40.450899619800595
],
[
-80.16702521343733,
41.73420113945659
]
]
],
"type": "Polygon"
}
},
}Define a function that takes the following params:
item: a STAC itemgeojson: the geojson of the AOI
Using the /cog/statistics/ endpoint of the raster API, we get back the statistics of the item (which corresponds to one COG file) within the given geojson AOI.
The statistics includes min, max, mean, std, etc.
# the bounding box should be passed to the geojson param as a geojson Feature or FeatureCollection
def generate_stats(item, geojson):
result = requests.post(
f"{RASTER_API_URL}/cog/statistics",
params={
"url": item.assets["cog_default"].href
},
json=geojson
).json()
return {
**result["properties"],
"start_datetime": str(item.properties.get("datetime", item.properties.get("start_datetime")))[:4],
"collection": item.collection_id
}Let’s start out with the US 🇺🇸 !
We’ll get all the items in the NO2 and SO2 collections and generate the statistics from them for the Ohio River Valley region of the United States.
usa_aoi = aois["usa"]
items = list(catalog.search(collections=[collections["no2"], collections["so2"]]).items())
usa_stats = [
generate_stats(item, usa_aoi)
for item in items
]Create a function that takes the statistics (which is a json) and converts it to a pandas dataframe in a format that’ll make it easy to read and visualize.
We’re only concerned with the mean statistics for this example. Specifically the change from the year 2005 in percentage. We’ll use pandas to calculate this change percentage and assign it to the change column.
def clean_stats(stats_json) -> pd.DataFrame:
# convert the stats_json as is to pandas dataframe
df = pd.json_normalize(stats_json)
# simple renaming for readability
df.columns = [col.replace("statistics.b1.", "") for col in df.columns]
# create a date column from the start_datetime column
df["date"] = pd.to_datetime(df["start_datetime"], format="%Y")
# sort the dataframe by the date column
df = df.sort_values(by=["date"])
# create a change column that calculates the change of the mean values from the value in 2005
df["change"] = df.groupby("collection", group_keys=False)["mean"].apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100))
return dfcleaned_stats_df = clean_stats(usa_stats)We’ll now create a time-series of the change in mean values for NO2 and SO2 for the area in the US.
plt.xticks([i for i in range(0, 21, 2)])
sns.set_style("darkgrid")
ax = sns.lineplot(
x="start_datetime",
y="change",
hue="collection",
data=cleaned_stats_df,
palette=["#2196f3", "#ff5722"],
style="collection",
markers=["*", "d"]
)
ax.set_title("US - Ohio River Valley")
ax.set_xlabel("Years")
ax.set_ylabel("Change from 2005 (%)")
plt.legend(frameon=False, ncol=3)
plt.show()Now, let’s create a function that creates this trend graph, given the country.
def create_chart(country):
stats = [generate_stats(item, aois[country]) for item in list(catalog.search(collections=[collections["no2"], collections["so2"]]).items())]
df = clean_stats(stats)
# Create a chart using Seaborn
plt.xticks([i for i in range(0, 21, 2)])
sns.set_style("darkgrid")
ax = sns.lineplot(
x="start_datetime",
y="change",
hue="collection",
data=df,
palette=["#2196f3", "#ff5722"],
style="collection",
markers=["*", "d"]
)
ax.set_title(country.title())
ax.set_xlabel("Years")
ax.set_ylabel("Change from 2005 (%)")
plt.legend(frameon=False, ncol=3)
plt.show()We can use this function to create charts for the rest of the AOIs.
India 🇮🇳
create_chart("india")China 🇨🇳
create_chart("china")