Validation¶
This notebook shows the validation script to check if the extract and process functions retrieve the correct data
In [1]:
Copied!
import xarray as xr
import numpy as np
import PolarGeosAI as pga
import xarray as xr
import numpy as np
import PolarGeosAI as pga
Extracting and processing for a date and time¶
In [4]:
Copied!
# importing the data
polar_data = xr.open_dataset(
'../../../GOES_ML_DOLDRUM/Branch-updatefilesystem/Github-GOES-ML/data/Scatterometers/cmems_obs-wind_glo_phy_nrt_l3-hy2c-hscat-asc-0.25deg_P1D-i-2020_2024.nc'
)
# Set the start and end datetime and the lat / lon ranges wanted for the extraction
start_datetime = "2022-01-01 00:00:00"
end_datetime = "2022-01-01 12:00:00"
lat_range = [-90, 90]
lon_range = [-180, 180]
observation_times, observation_lats, observation_lons, observation_windspeeds = pga.extract_scatter(
polar_data=polar_data,
start_datetime=start_datetime,
end_datetime=end_datetime,
lat_range=lat_range,
lon_range=lon_range,
main_variable="wind_speed",
)
# Which channels to extract from GOES (can be a list of multiple channels ex : ["C01", "C02", "etc."]).
# Known bug : C02 is not working for now. All other channels are working.
channels = ["C01"]
# use the function to extract all images corresponding to the observation data from the scatterometer
images = pga.extract_goes(
observation_times,
observation_lats,
observation_lons,
channels,
polar_data,
)
# Check if all the arrays have the same shape
print(np.shape(images))
print(np.shape(observation_times))
print(np.shape(observation_lats))
print(np.shape(observation_lons))
# importing the data
polar_data = xr.open_dataset(
'../../../GOES_ML_DOLDRUM/Branch-updatefilesystem/Github-GOES-ML/data/Scatterometers/cmems_obs-wind_glo_phy_nrt_l3-hy2c-hscat-asc-0.25deg_P1D-i-2020_2024.nc'
)
# Set the start and end datetime and the lat / lon ranges wanted for the extraction
start_datetime = "2022-01-01 00:00:00"
end_datetime = "2022-01-01 12:00:00"
lat_range = [-90, 90]
lon_range = [-180, 180]
observation_times, observation_lats, observation_lons, observation_windspeeds = pga.extract_scatter(
polar_data=polar_data,
start_datetime=start_datetime,
end_datetime=end_datetime,
lat_range=lat_range,
lon_range=lon_range,
main_variable="wind_speed",
)
# Which channels to extract from GOES (can be a list of multiple channels ex : ["C01", "C02", "etc."]).
# Known bug : C02 is not working for now. All other channels are working.
channels = ["C01"]
# use the function to extract all images corresponding to the observation data from the scatterometer
images = pga.extract_goes(
observation_times,
observation_lats,
observation_lons,
channels,
polar_data,
)
# Check if all the arrays have the same shape
print(np.shape(images))
print(np.shape(observation_times))
print(np.shape(observation_lats))
print(np.shape(observation_lons))
Extracting scatter data: 0%| | 0/20423 [00:00<?, ?it/s]Extracting scatter data: 100%|██████████| 20423/20423 [00:00<00:00, 149131.30it/s]
Scatterometer data extracted INFO :No file found for C01 on day 2022/001/15 for minute 57, skipping file INFO :No file found for C01 on day 2022/001/15 for minute 58, skipping file
Retrieving and processing GOES data: 100%|██████████| 11/11 [04:02<00:00, 22.06s/it]
(20423, 1, 30, 30) (20423,) (20423,) (20423,)
In [5]:
Copied!
# Package the data to be used in the model
images_packaged, numerical_data_packaged = pga.package_data(
images, observation_lats, observation_lons, observation_times,observation_windspeeds,filter=True, solar_conversion=False)
# Check if the data is correctly packaged
print(np.shape(images_packaged), np.shape(numerical_data_packaged))
# Package the data to be used in the model
images_packaged, numerical_data_packaged = pga.package_data(
images, observation_lats, observation_lons, observation_times,observation_windspeeds,filter=True, solar_conversion=False)
# Check if the data is correctly packaged
print(np.shape(images_packaged), np.shape(numerical_data_packaged))
Filtered invalid images Filled nans returning images, numerical_data (18148, 1, 30, 30) (4, 18148)
In [6]:
Copied!
lat = numerical_data_packaged[0]
lon = numerical_data_packaged[1]
time = numerical_data_packaged[2]
wind_speed = numerical_data_packaged[3]
parallel_index = np.load('./satellite_indices/HSCAT-L3-25km_1km at nadir_index.npy', allow_pickle=True)
lat_grd = polar_data.latitude
lon_grd = polar_data.longitude
lat = numerical_data_packaged[0]
lon = numerical_data_packaged[1]
time = numerical_data_packaged[2]
wind_speed = numerical_data_packaged[3]
parallel_index = np.load('./satellite_indices/HSCAT-L3-25km_1km at nadir_index.npy', allow_pickle=True)
lat_grd = polar_data.latitude
lon_grd = polar_data.longitude
The following test function recalculates degrees on the GOES grid from the given lat / lon and retrieve a single image from the AWS GOES server. The degrees of this image are compared with that of the automatic extraction to see if they match. The output images of both the automatic extraction and the single image extraction are compared and can be inspected visually for match
In [8]:
Copied!
# How many random samples we want to check
samples = 2
channel = ["C01"]
pga.test_images(
images_packaged, parallel_index, lat_grd, lon_grd, time, lat, lon, samples, channel
)
# How many random samples we want to check
samples = 2
channel = ["C01"]
pga.test_images(
images_packaged, parallel_index, lat_grd, lon_grd, time, lat, lon, samples, channel
)
testing images : ------------------- idx= 4947 time= 1.641046528e+18 lat= -0.625 lon= -2.875 0.149786 0.149842 -0.00203 -0.0013860017 vs 0.149786 0.14987001 -0.0020579994 -0.0013860017 reverse check of latitude and longitude : ------------------- check passed, lat and lon are correct ✓ 🦕 idx= 15469 time= 1.641059328e+18 lat= 18.375 lon= -50.125 0.068894 0.069566 0.054109998 0.054782003 vs 0.068894 0.069593996 0.054082 0.054782003 reverse check of latitude and longitude : ------------------- check passed, lat and lon are correct ✓ 🦕