# Identify and count alternating parts of a column in a (timeseries) dataframe

by Kjetil Dalaker   Last Updated September 11, 2019 13:26 PM

I am analyzing trades done in a futures contract, based on a csv file with a list of trades (columns are Side, Qty, Price, Date).

I have imported the file and sorted the trades chronologically by time. The column "Side" (BUY/SELL) is now:

``````B
S
S
B
B
S
S
B
B
B
B
``````

I want to give each row of B's and each row of S's a unique number, in order for me to group each individual parts of B's and S's for further analysis. I want for example to find out what the average price of each row of Bs and each row of Ss are.

In the example above there are 5 rows/parts in total, 3 B's and 2 S's. The first row of B's should be 1. The second row of B's should be 3 and the last row of B's should be 5. Basically I want to add a column with this output:

``````1
2
2
3
3
4
4
5
5
5
5
``````

Now I should be able to find the average price of the four B's in row number 5 using groupby with the new column as argument and mean().

But how can I make the counter needed for this new column? I am able to identify each change using somehing like `np.where()`, `diff()`, `abs() + cumsum()` and 1 and -1, but I dont see how I can add +1 to each alternation.

Tags :

Use `Series.shift` with compare not equal and cumulative sum by `Series.cumsum`:

``````df['new'] = df['Side'].ne(df['Side'].shift()).cumsum()
``````

How it working:

``````df = df.assign(shifted = df['Side'].shift(),
new = df['Side'].ne(df['Side'].shift()).cumsum())
print (df)
Side shifted   mask  new
0     B     NaN   True    1
1     S       B   True    2
2     S       S  False    2
3     B       S   True    3
4     B       B  False    3
5     S       B   True    4
6     S       S  False    4
7     B       S   True    5
8     B       B  False    5
9     B       B  False    5
10    B       B  False    5
``````
jezrael
September 11, 2019 13:08 PM