Preprocessing

climate_resilience.preprocess.calculate_Nth_percentile(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str, N: int = 99, models: Optional[List[str]] = None, mean_thresh: Optional[float] = None) → None

Calculates the Nth percentile.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.
N (int) – Nth percentile will be calculated.
models (List[str]) – Models of interest. Not mandatory. Defaults to None.
mean_thresh (float) – Threshold value to be used to filter mean values before computing percentile. Defaults to None.

Returns

The output DataFrame that is written to a csv file is also returned.

Return type

pd.DataFrame

Raises

ValueError – If the integer value of N is outside the range [0, 100].

climate_resilience.preprocess.calculate_pr_count_amount(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str, df_pr_csv_path: str, models: Optional[List[str]] = None) → None

Calculates precipitation count and amount.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.
df_pr_csv_path (str) – This data frame can be generated using the calculate_Nth_percentile() function. The csv file generated from this function is passed here as argument.
models (List[str]) – Models of interest. Not mandatory. Defaults to None.

Returns

The output DataFrame that is written to a csv file is also returned.

Return type

pd.DataFrame

Raises

KeyError – This error is raised if the correct historical column does not exist in the df_pr data frame that is mentioned in df_pr_csv_path.

climate_resilience.preprocess.calculate_temporal_mean(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str, start_date: str, end_date: str) → None

Calculates mean precipitation for the ‘historical’ scenario or between the start_date and the end_date.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.
start_date (str) – Must be in the format ‘YYYY-MM’ or ‘YYYY-MM-DD’.
end_date (str) – Must be in the format ‘YYYY-MM’ or ‘YYYY-MM-DD’.

Returns

The output DataFrame that is written to a csv file is also returned.

Return type

pd.DataFrame

climate_resilience.preprocess.get_climate_ensemble(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str) → None

Calculates the mean and std of data for each site.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.

climate_resilience.preprocess.get_per_year_stats(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str) → None

Calculates the year-wise max, mean, and std of data for each site.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.

climate_resilience.preprocess.get_sub_period_stats(sites: pandas.core.frame.DataFrame, scenarios: List[str], variables: List[str], datadir: str, date_ranges: List[Tuple[str]], comp_function: str = 'gt', get_stats: bool = True, agg_function: Optional[Callable] = None, **kwargs: object) → None

Calculates some stats within a specified date range.

Parameters

sites (pd.DataFrame) – Data Frame containing all the site information.
scenarios (List[str]) – Scenarios of interest.
variables (List[str]) – Variables of interest.
datadir (str) – Parent directory containing all the data files. The generated output file is also stored here.
date_ranges (List[Tuple[str]]) – Each tuple contains a start date and an end date as string in the format ‘YYYY-MM’ or ‘YYYY-MM-DD’.
comp_function (str, optional) – Comparision function between the aggregation function output and the date range values. This is used to get stats. Defaults to ‘gt’ (greater). Options: ‘eq’ (equal) | ‘gt’ (greater) | ‘lt’ (lesser) Can be a callable as well but that can be implemented if needed.
get_stats (bool, optional) – Count and Amount values are calculated only if this flag is set to True. Otherwise only the aggregation of values between the dates is performed. Defaults to True.
agg_function (Callable, optional) – This is the function that is used to aggregate the data between the given time ranges. Defaults to None, in which case 99th percentile is calculated. All argument other than an input array can be passed as kwargs.
kwargs (object, optional) – All the parameters that are needed as input for the agg_function can be passed in sequence at the end. Example: agg_function(data, **kwargs)

Raises

ValueError – Raises this exception if the value of comp_function() is anything other than the specified options.
ValueError – Raises this exception if the input format or type of dates in date_ranges is incorrect.