I am analyzing trades done in a futures contract, based on a csv file with a list of trades (columns are Side, Qty, Price, Date).
I have imported the file and sorted the trades chronologically by time. The column "Side" (BUY/SELL) is now:
B S S B B S S B B B B
I want to give each row of B's and each row of S's a unique number, in order for me to group each individual parts of B's and S's for further analysis. I want for example to find out what the average price of each row of Bs and each row of Ss are.
In the example above there are 5 rows/parts in total, 3 B's and 2 S's. The first row of B's should be 1. The second row of B's should be 3 and the last row of B's should be 5. Basically I want to add a column with this output:
1 2 2 3 3 4 4 5 5 5 5
Now I should be able to find the average price of the four B's in row number 5 using groupby with the new column as argument and mean().
But how can I make the counter needed for this new column? I am able to identify each change using somehing like
abs() + cumsum() and 1 and -1, but I dont see how I can add +1 to each alternation.
df['new'] = df['Side'].ne(df['Side'].shift()).cumsum()
How it working:
df = df.assign(shifted = df['Side'].shift(), mask = df['Side'].ne(df['Side'].shift()), new = df['Side'].ne(df['Side'].shift()).cumsum()) print (df) Side shifted mask new 0 B NaN True 1 1 S B True 2 2 S S False 2 3 B S True 3 4 B B False 3 5 S B True 4 6 S S False 4 7 B S True 5 8 B B False 5 9 B B False 5 10 B B False 5