Sharing some programming knowledge.

0%

Pandas:Create and Access Series and DataFrame

Series

Create Series

1
2
3
4
5
6
7
8
9
10
11
# create Series from ndarray
arr = np.array([1,2,3])
ser = pd.Series(arr, index=['a','b','c'])

# create Series from dict
dic = {'a':1, 'b':2, 'c':3}
ser = pd.Series(dic)

# create Series from list
lis = [1,2,3]
ser = pd.Series(lis, index=['a','b','c'])

Access Data in Series

1
2
3
4
5
6
7
8
# using position number
s = pd.Series([1,2,3], index=['a','b','c'])
s[0]
s[:3]

# using series index
s['a']
s[['a','b','c']]

DataFrame

Create Dataframes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# create dataframes from dict (of lists)
data = {'Name':['Tom', 'Jack', 'Steve'],'Age':[28,34,29]}
df = pd.DataFrame(data)

# create dataframes from list (of dicts)
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)

# create dataframes from dict (of series)
data = {'one':pd.Series([1,2,3], index=['a','b','c'], 'two':pd.Series([1,2,3,4], index=['a','b','c','d'])}
df = pd.DateFrame(data)

# create dataframes from files
df = pd.read_csv('path', sep=',')
df = pd.read_json('path')

Access (select) Data in Dataframes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# access by row (return either a series or a dataframe)
df.loc['first_index']
df.iloc[0]
df.loc['first_index':'third_index'] # will contain 'first_index', 'second_index', 'third_index'
df.iloc[1:3] # will only contain index 1, 2
df[0]
df[0:3]

# access by column (return either a series or a dataframe)
df['column_1']
df[['column_1','column_1']]
df.column_1

# access both by row and by column
df.loc[['first_row','second_row'], ['first_column', 'second_column']]
df.iloc[0:2, 0:2]

# access by conditional expression
movies_df[(movies_df['director'] == 'Christopher Nolan') | (movies_df['director'] == 'Ridley Scott')]
movies_df[movies_df['director'].isin(['Christopher Nolan', 'Ridley Scott'])]