Identify and count alternating parts of a column in a (timeseries) dataframe

by Kjetil Dalaker   Last Updated September 11, 2019 13:26 PM

I am analyzing trades done in a futures contract, based on a csv file with a list of trades (columns are Side, Qty, Price, Date).

awful screen shot of a table

I have imported the file and sorted the trades chronologically by time. The column "Side" (BUY/SELL) is now:

B
S
S
B
B
S
S
B
B
B
B

I want to give each row of B's and each row of S's a unique number, in order for me to group each individual parts of B's and S's for further analysis. I want for example to find out what the average price of each row of Bs and each row of Ss are.

In the example above there are 5 rows/parts in total, 3 B's and 2 S's. The first row of B's should be 1. The second row of B's should be 3 and the last row of B's should be 5. Basically I want to add a column with this output:

1
2
2
3
3
4
4
5
5
5
5

Now I should be able to find the average price of the four B's in row number 5 using groupby with the new column as argument and mean().

But how can I make the counter needed for this new column? I am able to identify each change using somehing like np.where(), diff(), abs() + cumsum() and 1 and -1, but I dont see how I can add +1 to each alternation.



Answers 1


Use Series.shift with compare not equal and cumulative sum by Series.cumsum:

df['new'] = df['Side'].ne(df['Side'].shift()).cumsum()

How it working:

df = df.assign(shifted = df['Side'].shift(),
               mask = df['Side'].ne(df['Side'].shift()),
               new = df['Side'].ne(df['Side'].shift()).cumsum())
print (df)
   Side shifted   mask  new
0     B     NaN   True    1
1     S       B   True    2
2     S       S  False    2
3     B       S   True    3
4     B       B  False    3
5     S       B   True    4
6     S       S  False    4
7     B       S   True    5
8     B       B  False    5
9     B       B  False    5
10    B       B  False    5
jezrael
jezrael
September 11, 2019 13:08 PM

Related Questions


pandas groupby apply is really slow

Updated November 05, 2017 15:26 PM

Do not map item to any output using apply()

Updated July 30, 2018 21:26 PM