STEM 隨筆︰古典力學︰模擬術【小工具】八《大數據》一

因為 bqplot 築於 pandas 之上,所以先提成名已久的功夫熊貓也︰

Python Data Analysis Library

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project, and makes it possible todonate to the project.

NumFOCUS Logo

……

Package overview

pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

pandas consists of the following elements:

  • A set of labeled array data structures, the primary of which are Series and DataFrame.
  • Index objects enabling both simple axis indexing and multi-level / hierarchical axis indexing.
  • An integrated group by engine for aggregating and transforming data sets.
  • Date range generation (date_range) and custom date offsets enabling the implementation of customized frequencies.
  • Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.
  • Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value).
  • Moving window statistics (rolling mean, rolling standard deviation, etc.).

Data Structures

Dimensions Name Description
1 Series 1D labeled homogeneously-typed array
2 DataFrame General 2D labeled, size-mutable tabular structure with potentially heterogeneously-typed column

 

Why more than one data structure?

The best way to think about the pandas data structures is as flexible containers for lower dimensional data. For example, DataFrame is a container for Series, and Series is a container for scalars. We would like to be able to insert and remove objects from these containers in a dictionary-like fashion.

Also, we would like sensible default behaviors for the common API functions which take into account the typical orientation of time series and cross-sectional data sets. When using ndarrays to store 2- and 3-dimensional data, a burden is placed on the user to consider the orientation of the data set when writing functions; axes are considered more or less equivalent (except when C- or Fortran-contiguousness matters for performance). In pandas, the axes are intended to lend more semantic meaning to the data; i.e., for a particular data set there is likely to be a “right” way to orient the data. The goal, then, is to reduce the amount of mental effort required to code up data transformations in downstream functions.

For example, with tabular data (DataFrame) it is more semantically helpful to think of the index (the rows) and the columns rather than axis 0 and axis 1. Iterating through the columns of the DataFrame thus results in more readable code:

for col in df.columns:
    series = df[col]
    # do something with series

 

既然範例

Tutorials

This is a guide to many pandas tutorials, geared mainly for new users.

Internal Guides

pandas’ own 10 Minutes to pandas.

More complex recipes are in the Cookbook.

A handy pandas cheat sheet.

 

速讀

10 Minutes to pandas

This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in theCookbook.

Customarily, we import as follows:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: import matplotlib.pyplot as plt

 

食譜

Cookbook

This is a repository for short and sweet examples and links for useful pandas recipes. We encourage users to add to this documentation.

Adding interesting links and/or inline examples to this section is a great First Pull Request.

Simplified, condensed, new-user friendly, in-line examples have been inserted where possible to augment the Stack-Overflow and GitHub links. Many of the links contain expanded information, above what the in-line examples offer.

Pandas (pd) and Numpy (np) are the only two abbreviated imported modules. The rest are kept explicitly imported for newer users.

These examples are written for Python 3. Minor tweaks might be necessary for earlier python versions.

 

小抄

cheat sheet

 

齊全,故無須多言,還是留與讀者自讀的吧!