MetopDataset

MetopDatasets.MetopDatasetType
MetopDataset(file_path::AbstractString; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)
MetopDataset(file_pointer::IO; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)
MetopDataset(f::Function, file_path::AbstractString; auto_convert::Bool = true, high_precision::Bool=false, maskingvalue = missing)

Load a MetopDataset from a Metop Native binary file or from a IO to a Native binary file. Only the meta data is loaded upon creation and all variables are lazy loaded. The variables corresponds to the different fields of the data records in the file. The attributes have all the information from the main product header in the file.

auto_convert=true will automatically convert MetopDatasets specific types such as VInteger to common netCDF complaint types such as Float64. This will also automatically scale variable where the scaling can't be expressed through a simple scale factor e.g. the IASI spectrum where different bands of the spectrum have different scaling factors.

Selected fields are converted to Float32 to save memory. Normally Float32 is more than sufficient to represent the instrument accuracy. Setting high_precision=true will in some case convert these variables to Float64.

maskingvalue = NaN will replace missing values with NaN. This normally floats but can create issues for integers. See documentation page for more information.

Example

julia> file_path = "test/testData/ASCA_SZR_1B_M03_20230329063300Z_20230329063558Z_N_C_20230329081417Z"
julia> ds = MetopDataset(file_path);
julia>
julia> # display metadata of a variable
julia> ds["latitude"]
latitude (82 × 96)
  Datatype:    Union{Missing, Float64} (Int32)
  Dimensions:  xtrack × atrack
  Attributes:
   description          = Latitude (-90 to 90 deg)
   missing_value        = Int32[-2147483648]
   scale_factor         = 1.0e-6
julia>
julia> # load a subset of a variable  
julia> lat_subset = ds["latitude"][1:2,1:3] # load a small subset of latitudes.
2×3 Matrix{Union{Missing,Float64}}:
    -33.7308  -33.8399  -33.949
    -33.7139  -33.823   -33.9322
julia>
julia> # load entire variable  
julia> lat = ds["latitude"][:,:]
julia>
julia> # close data set
julia> close(ds);
source

Keys, attributes and dimensions.

These methods can help to explore the dataset without printing out everything.

Use keys list the names of all variables without meta data

@show keys(ds)

# loop over all variables
for (varname,var) in ds
    # all variables
    @show (varname,size(var))
end

Access the attributes via the .attrib

@show ds.attrib

# attributes of a variable
example_var_name = keys(ds)[end]
example_var = ds[example_var_name]
@show example_var.attrib

Access the dimensions via the .dim and dimnames

@show ds.dim

# attributes of a variable
example_var_name = keys(ds)[end]
example_var = ds[example_var_name]
@show dimnames(example_var)

Note that MetopDataset is not implement any groups. Hence isempty(ds.group) is always true.

Auto conversion and native types

The Metop native binary formats uses some custom data types. Theres are converted to standard netCDF compatible types by default. This conversion can be disable with the keyword argument auto_convert=false. Here is an example

ds = MetopDataset("IASI_xxx_1C_M01_20240925202059Z_20240925220258Z_N_O_20240925211316Z.nat")

function show_example(ds, var_name)
    val = ds[var_name][1]
    @show var_name
    @show typeof(val)
    @show val
    println()
end


MetopDataset(iasi_file, auto_convert=false) do ds
    println("With auto_convert=false")
    println()
    show_example(ds,"record_start_time");
    show_example(ds,"gepsiasimode");
    show_example(ds,"gepslociasiavhrr_iasi");
end

Output

With auto_convert=false

var_name = "record_start_time"
typeof(val) = MetopDatasets.ShortCdsTime
val = MetopDatasets.ShortCdsTime(0x234a, 0x045dd976)

var_name = "gepsiasimode"
typeof(val) = MetopDatasets.BitString{4}
val = 00000000000000000000000010100001


var_name = "gepslociasiavhrr_iasi"
typeof(val) = MetopDatasets.VInteger{Int32}
val = MetopDatasets.VInteger{Int32}(6, -1965000000)

If we run the same example with auto convert on.

MetopDataset(iasi_file, auto_convert=true) do ds
    println("With auto_convert=true")
    println()
    show_example(ds,"record_start_time");
    show_example(ds,"gepsiasimode");
    show_example(ds,"gepslociasiavhrr_iasi");
end

Output

With auto_convert=true

var_name = "record_start_time"
typeof(val) = Dates.DateTime
val = Dates.DateTime("2024-09-25T20:20:59.382")

var_name = "gepsiasimode"
typeof(val) = UInt32
val = 0x000000a1

var_name = "gepslociasiavhrr_iasi"
typeof(val) = Float64
val = -1965.0

Note that the auto_convert argument also controls if the IASI L1 spectrum "gs1cspect" is automatically scaled. Multiple scale factors are needed to scale the spectrum and therefore the scaling of the spectrum is handled different from other variables. The spectrum is automatically scaled to Float32 to save memory. Use the high_precision=true argument to change this to Float64.

Missing values

Note that the datasets can contain missing values. This is especially true for product formats with flexible dimensions like the IASI L2 products. Here is an example.

using MetopDatasets
ds = MetopDataset("IASI_SND_02_M01_20241215173256Z_20241215173552Z_N_C_20241215182326Z");

ds["atmospheric_temperature"][:,:,6]

Output

101×120 Matrix{Union{Missing, Float64}}:    
 190.85      189.27      …  missing  missing
 195.62      193.93         missing  missing     
 204.47      202.59         missing  missing     
 212.82      210.87         missing  missing     
   ⋮                     ⋱
    missing     missing     missing  missing     
    missing     missing     missing  missing     
    missing     missing  …  missing  missing  

Here the output variable is Union{Missing, Float64} which can be difficult to work with. Sometimes it can be and advantage to replace the missing values with NaN values. This can be done on the variable level.

var_no_missing = cfvariable(ds, "atmospheric_temperature", maskingvalue = NaN)
var_no_missing[:,:,6]

Output

101×120 Matrix{Float64}:
 190.85  189.27  189.0   …  NaN  NaN  NaN  NaN   
 195.62  193.93  193.69     NaN  NaN  NaN  NaN   
 204.47  202.59  202.49     NaN  NaN  NaN  NaN   
 212.82  210.87  210.99     NaN  NaN  NaN  NaN   
   ⋮                     ⋱
 NaN     NaN     NaN        NaN  NaN  NaN  NaN   
 NaN     NaN     NaN        NaN  NaN  NaN  NaN   
 NaN     NaN     NaN     …  NaN  NaN  NaN  NaN

Note that this is not recommend for integer fields since it results in an automatic conversion to float. This is especially and issue in the cases where the integer value is a representation of an underlying bit string.

var_temp_error = cfvariable(ds, "temperature_error", maskingvalue = NaN)

val_as_scalar = var_temp_error[1,1,1]
val_as_array = var_temp_error[1:1,1,1]

@show val_as_scalar, bitstring(val_as_scalar);
@show val_as_array, bitstring.(val_as_array); #wrong bitstring due to conversion

Output

(val_as_scalar, bitstring(val_as_scalar)) = (0x4277d0a4, "01000010011101111101000010100100")
(val_as_array, bitstring.(val_as_array)) = ([1.115148452e9], ["0100000111010000100111011111010000101001000000000000000000000000"])

It is also possible to set the maskingvalue for an entire dataset. This is convenient but can lead to issues regarding integers as illustrated above. Here is an example:

ds_no_missing = MetopDataset("IASI_SND_02_M01_20241215173256Z_20241215173552Z_N_C_20241215182326Z", maskingvalue = NaN);
ds_no_missing["atmospheric_temperature"][:,:,6]

Output

101×120 Matrix{Float64}:
 190.85  189.27  189.0   …  NaN  NaN  NaN  NaN   
 195.62  193.93  193.69     NaN  NaN  NaN  NaN   
 204.47  202.59  202.49     NaN  NaN  NaN  NaN   
 212.82  210.87  210.99     NaN  NaN  NaN  NaN   
   ⋮                     ⋱
 NaN     NaN     NaN        NaN  NaN  NaN  NaN   
 NaN     NaN     NaN        NaN  NaN  NaN  NaN   
 NaN     NaN     NaN     …  NaN  NaN  NaN  NaN