Skip to content
Snippets Groups Projects
Forked from toolbox / WindEnergyToolbox
1259 commits behind the upstream repository.
using-statistics-df.md 6.91 KiB

How to use the Statistics DataFrame

Introduction

The statistical data of your post-processed load cases are saved in the HDF format. You can use Pandas to retrieve and organize that data. Pandas organizes the data in a DataFrame, and the library is powerful, comprehensive and requires some learning. There are extensive resources out in the wild that can will help you getting started:

  • list of good tutorials can be found in the Pandas documentation.
  • short and simple tutorial as used for the Python 4 Wind Energy course

The data is organized in simple 2-dimensional table. However, since the statistics of each channel is included for multiple simulations, the data set is actually 3-dimensional. As an example, this is how a table could like:

   [case_id]  [channel name]  [mean]  [std]    [windspeed]
       sim_1          pitch        0      1              8
       sim_1            rpm        1      7              8
       sim_2          pitch        2      9              9
       sim_2            rpm        3      2              9
       sim_3          pitch        0      1              7

Each row is a channel of a certain simulation, and the columns represent the following:

  • a tag from the master file and the corresponding value for the given simulation
  • the channel name, description, units and unique identifier
  • the statistical parameters of the given channel

Load the statistics as a pandas DataFrame

Pandas has some very powerful functions that will help analysing large and complex DataFrames. The documentation is extensive and is supplemented with various tutorials. You can use 10 Minutes to pandas as a first introduction.