MetopDataset
MetopDatasets.MetopDataset
— TypeMetopDataset(file_path::AbstractString; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)
MetopDataset(file_pointer::IO; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)
MetopDataset(f::Function, file_path::AbstractString; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)
Load a MetopDataset from a Metop Native binary file or from a IO
to a Native binary file. Only the meta data is loaded upon creation and all variables are lazy loaded. The variables corresponds to the different fields of the data records in the file. The attributes have all the information from the main product header in the file.
auto_convert=true
will automatically convert MetopDatasets
specific types such as VInteger
to common netCDF complaint types such as Float64
. This will also automatically scale variable where the scaling can't be expressed through a simple scale factor e.g. the IASI spectrum where different bands of the spectrum have different scaling factors.
Selected fields are converted to Float32
to save memory. Normally Float32
is more than sufficient to represent the instrument accuracy. Setting high_precision=true
will in some case convert these variables to Float64
.
maskingvalue = NaN
will replace missing
values with NaN. This normally floats but can create issues for integers. See documentation page for more information.
Example
julia> file_path = "test/testData/ASCA_SZR_1B_M03_20230329063300Z_20230329063558Z_N_C_20230329081417Z"
julia> ds = MetopDataset(file_path);
julia>
julia> # display metadata of a variable
julia> ds["latitude"]
latitude (82 × 96)
Datatype: Union{Missing, Float64} (Int32)
Dimensions: xtrack × atrack
Attributes:
description = Latitude (-90 to 90 deg)
missing_value = Int32[-2147483648]
scale_factor = 1.0e-6
julia>
julia> # load a subset of a variable
julia> lat_subset = ds["latitude"][1:2,1:3] # load a small subset of latitudes.
2×3 Matrix{Union{Missing,Float64}}:
-33.7308 -33.8399 -33.949
-33.7139 -33.823 -33.9322
julia>
julia> # load entire variable
julia> lat = ds["latitude"][:,:]
julia>
julia> # close data set
julia> close(ds);
Keys, attributes and dimensions.
These methods can help to explore the dataset without printing out everything.
Use keys list the names of all variables without meta data
@show keys(ds)
# loop over all variables
for (varname,var) in ds
# all variables
@show (varname,size(var))
end
Access the attributes via the .attrib
@show ds.attrib
# attributes of a variable
example_var_name = keys(ds)[end]
example_var = ds[example_var_name]
@show example_var.attrib
Access the dimensions via the .dim and dimnames
@show ds.dim
# attributes of a variable
example_var_name = keys(ds)[end]
example_var = ds[example_var_name]
@show dimnames(example_var)
Note that MetopDataset
is not implement any groups. Hence isempty(ds.group)
is always true.
Auto conversion and native types
The Metop native binary formats uses some custom data types. Theres are converted to standard netCDF compatible types by default. This conversion can be disable with the keyword argument auto_convert=false
. Here is an example
ds = MetopDataset("IASI_xxx_1C_M01_20240925202059Z_20240925220258Z_N_O_20240925211316Z.nat")
function show_example(ds, var_name)
val = ds[var_name][1]
@show var_name
@show typeof(val)
@show val
println()
end
MetopDataset(iasi_file, auto_convert=false) do ds
println("With auto_convert=false")
println()
show_example(ds,"record_start_time");
show_example(ds,"gepsiasimode");
show_example(ds,"gepslociasiavhrr_iasi");
end
Output
With auto_convert=false
var_name = "record_start_time"
typeof(val) = MetopDatasets.ShortCdsTime
val = MetopDatasets.ShortCdsTime(0x234a, 0x045dd976)
var_name = "gepsiasimode"
typeof(val) = MetopDatasets.BitString{4}
val = 00000000000000000000000010100001
var_name = "gepslociasiavhrr_iasi"
typeof(val) = MetopDatasets.VInteger{Int32}
val = MetopDatasets.VInteger{Int32}(6, -1965000000)
If we run the same example with auto convert on.
MetopDataset(iasi_file, auto_convert=true) do ds
println("With auto_convert=true")
println()
show_example(ds,"record_start_time");
show_example(ds,"gepsiasimode");
show_example(ds,"gepslociasiavhrr_iasi");
end
Output
With auto_convert=true
var_name = "record_start_time"
typeof(val) = Dates.DateTime
val = Dates.DateTime("2024-09-25T20:20:59.382")
var_name = "gepsiasimode"
typeof(val) = UInt32
val = 0x000000a1
var_name = "gepslociasiavhrr_iasi"
typeof(val) = Float64
val = -1965.0
Note that the auto_convert
argument also controls if the IASI L1 spectrum "gs1cspect" is automatically scaled. Multiple scale factors are needed to scale the spectrum and therefore the scaling of the spectrum is handled different from other variables. The spectrum is automatically scaled to Float32
to save memory. Use the high_precision=true
argument to change this to Float64
.
Missing values
Note that the datasets can contain missing values. This is especially true for product formats with flexible dimensions like the IASI L2 products. Here is an example.
using MetopDatasets
ds = MetopDataset("IASI_SND_02_M01_20241215173256Z_20241215173552Z_N_C_20241215182326Z");
ds["atmospheric_temperature"][:,:,6]
Output
101×120 Matrix{Union{Missing, Float64}}:
190.85 189.27 … missing missing
195.62 193.93 missing missing
204.47 202.59 missing missing
212.82 210.87 missing missing
⋮ ⋱
missing missing missing missing
missing missing missing missing
missing missing … missing missing
Here the output variable is Union{Missing, Float64}
which can be difficult to work with. Sometimes it can be and advantage to replace the missing
values with NaN
values. This can be done on the variable level.
var_no_missing = cfvariable(ds, "atmospheric_temperature", maskingvalue = NaN)
var_no_missing[:,:,6]
Output
101×120 Matrix{Float64}:
190.85 189.27 189.0 … NaN NaN NaN NaN
195.62 193.93 193.69 NaN NaN NaN NaN
204.47 202.59 202.49 NaN NaN NaN NaN
212.82 210.87 210.99 NaN NaN NaN NaN
⋮ ⋱
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN … NaN NaN NaN NaN
Note that this is not recommend for integer fields since it results in an automatic conversion to float. This is especially and issue in the cases where the integer value is a representation of an underlying bit string.
var_temp_error = cfvariable(ds, "temperature_error", maskingvalue = NaN)
val_as_scalar = var_temp_error[1,1,1]
val_as_array = var_temp_error[1:1,1,1]
@show val_as_scalar, bitstring(val_as_scalar);
@show val_as_array, bitstring.(val_as_array); #wrong bitstring due to conversion
Output
(val_as_scalar, bitstring(val_as_scalar)) = (0x4277d0a4, "01000010011101111101000010100100")
(val_as_array, bitstring.(val_as_array)) = ([1.115148452e9], ["0100000111010000100111011111010000101001000000000000000000000000"])
It is also possible to set the maskingvalue
for an entire dataset. This is convenient but can lead to issues regarding integers as illustrated above. Here is an example:
ds_no_missing = MetopDataset("IASI_SND_02_M01_20241215173256Z_20241215173552Z_N_C_20241215182326Z", maskingvalue = NaN);
ds_no_missing["atmospheric_temperature"][:,:,6]
Output
101×120 Matrix{Float64}:
190.85 189.27 189.0 … NaN NaN NaN NaN
195.62 193.93 193.69 NaN NaN NaN NaN
204.47 202.59 202.49 NaN NaN NaN NaN
212.82 210.87 210.99 NaN NaN NaN NaN
⋮ ⋱
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN … NaN NaN NaN NaN