Tuesday, February 27, 2018

Notes | Entry-Level Stock Price Analysis | Stats On Stock Price Percentage Change N Days After

Quoted from this page

1) [Import Libraries]
We need at least 2 Python libraries (pandas and numpy) to read stock data (step 3) and calculation.

2) [Define Data Source]
In case you have the stock data in CSV or Excel format, you can simply
store it in variable (e.g. data_source) assigning the file path (e.g. r'your_path', see pic)

If you want to get data feed online, you might:
 (i) write code on another Python file + store the result in local drive + assign variable to that path
(see this video https://www.youtube.com/watch?v=BKezm2_hLNw)
or
(ii) wrtie code on the SAME Python file + store the result in local drive + assign variable to that path

I preferred method (i) because of the quicker load time and cleaner code to maintain.
I ran into problem with method (ii).  I could fix that by creating multiple variables but still, it's not
very convenient for me.

3) [Store Data In Tabular Format]
Simply use the 'pandas' package to read your source file in tabular format.
The 'dropna()' simply clear whatever row that has non-numeric/datetime value inside.

4) [Define Target Column(s)]
This is the real start of our solution.
We first set the target.  In this case, my question is as follows:

"I want to know when something happens (triggering event), what do I get (target_event)."

From pic_1 and pic_3, I want to know [the percentage change N days after [trigger_event] happens.

That's why I need to define [the percentage change] by using the '.pct_change()' method after
the column (e.g. Close).

5) [Define Triggering Event(s)]
I define the 'triggering event' in the variable 'criteria' which can further be divided into
many sub-sections (e.g. 'tcriteria' means target criteria; 'dcriteria' means day period...)
The variable names are up to you (e.g. you can call 'trigger_event' instead of 'criteria')

In Pic_1, (see 5a), I filter out those with [today's stock price change %] is UP or DOWN more than 1%
(e.g. abs(df['day_chg'] ) > 1)
You might change the 'tcriteria' to something like

my_cat_s_birthday = '2018-02-08'
Replace 'tcriteria' with 'my_cat_s_birthday'

It's totally up to you.

The reason I split 'criteria' into (5a - tcriteria) and (5b - dcritiera) is
the greater flexibility. 
Usually, I want to know whatever triggering event (5a) happens in whatever period (5b).
Through adding a separate period criteria (5b - dcritiera), it's easier to maintain the code.

6) [Define LookUp Period]
Usually, we want to know 'what will [the stock price] be [N days after today].
This is the part to define.

In Pic_1, we want to know 'the percentage change' (df['day_chg']) 1 day after today, so
our code is

df['Ndaychg'] = df['day_chg'].shift(-1)

Since we will be building more interactive app for data analysis,
I suggest generalizing the above formula like this:

lookforwardday = 1   <- you can change this to input(whatever number here, see Pic_3)

df['Ndaychg'] = df['day_chg'].shift(lookforwardday)

7) [Print Result]

Note that you can print 'whatever you want' from your dataset.

But in my case, I want to know the 'mean', 'standard deviation', and 'median' of 'columns that I want to view'.

Given that, I split my print action into (7a) and (7b)
(7a) is simply printing ALL information (e.g. include Open High Low CLose ...) that meets the criteria
stated in step (5)

In Pic_1, you notice the latest day (the bottom row, 2018-02-28 has [NdayChg] = NaN, not a number.
because, on Feb 28, 2018, we don't know what will happen the day after, Mar 1, 2018.

In Pic_3, we define the 'lookforwardday' to be 3.  That's why, we won't know what will 3 (trade days)
after will be, on Feb 26, and Feb 28, 2018.


(7b) Based on the info (see 7a), I add the 'columns that I want to view the stats'
(e.g. .loc[:, ['Vol','Ndaychg']]) 

Put it simple,

df[critieria]   add  .loc[all rows, ['column1', 'column2','column_whatever']]  then add
.agg(['mean','std','median'])  





No comments:

Post a Comment