Algorithmic Trading Support Vector Machines

In this post we will use Support Vector Machines (SVMs) in developing our algorithmic trading strategy. Did you read my previous post on how to use indicators in expert advisors. We develop two Support Vector Machines (SVMs). One predicts the volatility and second predicts the market direction.  We will use a trick to transform the data and make the predictions. You may like the trick that I use to predict price after 10, 20, 30 hours. If you are interested to know the trick, continue reading the post.

Support Vector Trading

More and more traders are getting interested in algorithmic trading. If you want to become an algorithmic trader, you should learn Python. Python is a modern object oriented language that can be used for many things but in recent years has become the most popular programming language for artificial intelligence, machine learning, deep learning as well as reinforcement learning. You can take a look at the courses that I have developed for teaching python to traders.

The first course is the Python for Traders. It teaches you how to start programming your algorithmic trading strategies in Python. No previous experience or knowledge of Python programming is required. The second course is Python Machine Learning for Traders. Once you master machine learning with Python, you can take the next course Python Deep Learning for Trader. In the last course Algorithmic Trading with Python, I teach you how to connect Python with MT4 so that you can use its powerful machine learning and deep learning libraries in developing your EA.

I want to introduce Python to you in this post. We will start with the basics and then do some advanced stuff using Pandas. Pandas is a very powerful Python library that get used on daily basis at Wall Street and in other quant funds.

>>> import datetime
>>> import math
>>> import time
>>> import numpy as np
>>> import pandas as pd
>>>
>>> t=time.time()
>>> df = pd.read_csv(\
... 'D:/Shared/MarketData/EURUSD60.csv', header=None)
>>> df.columns=['Date', 'Time', 'Open', 'High', 'Low',\
... 'Close', 'Volume']
>>> df.tail()
            Date   Time     Open     High      Low    Close  Volume
7753  2018.11.09  17:00  1.13240  1.13374  1.13168  1.13317    2800
7754  2018.11.09  18:00  1.13312  1.13385  1.13265  1.13307    1718
7755  2018.11.09  19:00  1.13305  1.13388  1.13296  1.13352    1588
7756  2018.11.09  20:00  1.13353  1.13374  1.13294  1.13369    1274
7757  2018.11.09  21:00  1.13359  1.13392  1.13324  1.13350     660
>>> #reverse the dataframe
... #df2=df[::-1]
...
>>> #create an empty dataframe
... df1=pd.DataFrame()
>>> df1.tail()
Empty DataFrame
Columns: []
Index: []
>>> #define the candle timeframe
... n =40
>>> k=len(df)-2-math.floor((len(df)-2)/n)*n
>>> df1["Date"]=df.Date[k:len(df)-2:n].values
>>> df1["Time"]=df.Time[k:len(df)-2:n].values
>>> df1["Open"]=df.Open[k:len(df)-2:n].values
>>> df1["High"] =\
...  df.High.rolling(n).max()[k+n:len(df)-1:n].values
>>> df1["Low"] =\
...  df.Low.rolling(n).min()[k+n:len(df)-1:n].values
>>> df1["Close"]=df.Close[k+n:len(df)-1:n].values
>>> df1["Volume"] =\
...  df.Volume.rolling(n).sum()[k+n:len(df)-1:n].values
>>> df1.tail()
           Date   Time     Open     High      Low    Close   Volume
188  2018.10.30  11:00  1.13570  1.13810  1.13016  1.13458  66344.0
189  2018.11.01  03:00  1.13404  1.14552  1.13371  1.13946  78165.0
190  2018.11.02  19:00  1.13848  1.14250  1.13528  1.14176  58387.0
191  2018.11.06  12:00  1.14065  1.14991  1.13942  1.14243  81837.0
192  2018.11.08  04:00  1.14284  1.14459  1.13168  1.13369  86844.0
>>> df1=df1.append(df.iloc[len(df)-1], ignore_index=True)
>>> df1.tail()
           Date   Time     Open     High      Low    Close   Volume
189  2018.11.01  03:00  1.13404  1.14552  1.13371  1.13946  78165.0
190  2018.11.02  19:00  1.13848  1.14250  1.13528  1.14176  58387.0
191  2018.11.06  12:00  1.14065  1.14991  1.13942  1.14243  81837.0
192  2018.11.08  04:00  1.14284  1.14459  1.13168  1.13369  86844.0
193  2018.11.09  21:00  1.13359  1.13392  1.13324  1.13350    660.0
>>> time.time()-t
0.12879610061645508

As you can see Pandas is pretty fast. I timed the code execution and Pandas did the whole job blazingly fast in just 0.12 second. Pandas is a highly optimized Python library. It uses Numpy underneath which is Python library for scientific computations. Numpy has been written in C language which is a very fast language. Numpy provides the framework on which Pandas has been written. So you can see above it did the whole conversion of the dataframe is in less than 0.1 seconds. Sometimes I also get 0.07 seconds. Now let’s convert the above code into a function that we can use wheneve we want.

>>> def newOHLCV(df, n):
...     '''
...     This function creates a new OHLCV dataframe with
...     a different time interval for the purpose of
...     drawing new candlesticks.
...     Params:
...     df is the original dataframe
...     n is the new time interval
...     '''
...     #create an empty dataframe
...     df1=pd.DataFrame()
...     df1.tail()
...     #define the candle timeframe
...     k=len(df)-2-math.floor((len(df)-2)/n)*n
...     df1["Date"]=df.Date[k:len(df)-2:n].values
...     df1["Time"]=df.Time[k:len(df)-2:n].values
...     df1["Open"]=df.Open[k:len(df)-2:n].values
...     df1["High"] =\
...     df.High.rolling(n).max()[k+n:len(df)-1:n].values
...     df1["Low"] =\
...     df.Low.rolling(n).min()[k+n:len(df)-1:n].values
...     df1["Close"]=df.Close[k+n:len(df)-1:n].values
...     df1["Volume"] =\
...     df.Volume.rolling(n).sum()[k+n:len(df)-1:n].values
...     df1=df1.append(df.iloc[len(df)-1], ignore_index=True)
...     return df1
...
>>> df2=newOHLCV(df,40)
>>> df2.tail()
           Date   Time     Open     High      Low    Close   Volume
189  2018.11.01  03:00  1.13404  1.14552  1.13371  1.13946  78165.0
190  2018.11.02  19:00  1.13848  1.14250  1.13528  1.14176  58387.0
191  2018.11.06  12:00  1.14065  1.14991  1.13942  1.14243  81837.0
192  2018.11.08  04:00  1.14284  1.14459  1.13168  1.13369  86844.0
193  2018.11.09  21:00  1.13359  1.13392  1.13324  1.13350    660.0

I have been developing algorithmic trading strategies and I find the above code highly useful. It allows me to change the timeframe of the candlestick chart at any time point. As you can see above I didn’t use a for loop. For loops are inherently slow. They took more time in execution as compared to above vectorized operations. You should have the ability to look at the candlestick charts with a different zoom lens. The above code provides you with that zoom lens. Take a look at this Neural Network Forex Trading System.

Let’s start.

import datetime
import time
import math
import numpy as np
import pandas as pd
import talib as ta
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier\
as make_forest
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import normalize
import xgboost as xgb

After reading the libraries we define two functions that are crucial in our algorithmic trading strategy. First function we have already defined the second function we define below:

#define the features
def TA1(df):
    #calculate RSI 14
    df["rsi"]=ta.RSI(df.Close, 14)
    #calculate MACD
    df["macd"], df["macdSignal"],\
    df["macdHist"] = ta.MACD(df.Close,\
    fastperiod=12, slowperiod=26, signalperiod=9)
    #calculate Williams %R
    df["williamsR"] = ta.WILLR(df.High, df.Low,\
    df.Close, timeperiod=14)
    #calculate Wilder's DMI better known as ADX
    df["adx"] = ta.ADX(df.High, df.Low, df.Close, timeperiod=14)
    # calculate the difference from MA
    df["priceMAdiff"] = df.Close-ta.MA(df.Close, timeperiod=25)
    #df['EMA'] = ta.EMA(df.Close, timeperiod=21)
    df["Pips"]=10000*(df.Close.shift(-1)-df.Close)
    df.loc[df.Pips.abs() > 60, "Target"]=1
    df.loc[df.Pips.abs() > 60, "Target"]=0
    #df.loc[df.Pips < -70, "Target"]=-1
    #df.loc[df.Pips > 70, "Target"]=1
    #df.loc[df.Pips < -70, "Target"]=-1
    #df.loc[(df.Pips > -70) & (df.Pips < 70), "Target"]=0
    df["rsi1"]=df.rsi.shift(1)
    df["rsi2"]=df.rsi.shift(2)
    df["rsi3"]=df.rsi.shift(3)
    df["macd1"]=df.macd.shift(1)
    df["macd2"]=df.macd.shift(2)
    df["macd3"]=df.macd.shift(3)
    df["williamsR1"]=df.williamsR.shift(1)
    df["williamsR2"]=df.williamsR.shift(2)
    df["williamsR3"]=df.williamsR.shift(3)
    df["adx1"]=df.adx.shift(1)
    df["adx2"]=df.adx.shift(2)
    df["adx3"]=df.adx.shift(3)
    df["priceMAdiff1"]=df.priceMAdiff.shift(1)
    df["priceMAdiff2"]=df.priceMAdiff.shift(2)
    df["priceMAdiff3"]=df.priceMAdiff.shift(3)
    df=df.dropna()    
    return df

Above we have defined the Technical Analysis function. We will be using different technical indicators like RSI, MACD, Williams %R and ADX. I want to check whether we can use these technical indicators in developing an algorithmic trading strategy. These technical indicators are usually visually interpreted. A machine learning algorithm should be able to read the visual patterns.

#select the features for training
#define the Support Vector Machine prediction function
#create an empty dataframe
svmCLF2 = SVC(gamma='auto')

SVC(C=1, cache_size=200, class_weight='balanced',
 coef0=0.0,decision_function_shape='ovr',\
  degree=3, gamma='auto', kernel='rbf',\
    max_iter=-1, probability=False,\
     random_state=None, shrinking=True,\
    tol=0.001, verbose=False)

minmaxScale=preprocessing.MinMaxScaler(\
feature_range=(0,1))

IN the above we defined the Support Vector Machine classifier. Take a look at this High Frequency Trading Algorithmic Trading System.

#select the features for training
#define the Random Forest prediction function
#create an empty dataframe
df2=pd.DataFrame()
ndf=len(df)
n=30
t=time.time()
for k in range(n*300, ndf-2):
    df1=newOHLCV(df.iloc[k-n*300:k], n)
    df1=TA1(df1) 
    x= df1[["rsi", "rsi1", "rsi2", "rsi3",\
    "macd", "macd1", "macd2", "macd3", "williamsR",\
    "williamsR1", 'williamsR2', "williamsR3",\
    "adx", "adx1", "adx2", "adx3",\
     "priceMAdiff", "priceMAdiff1", "priceMAdiff2",\
     "priceMAdiff3"]].as_matrix()
    x=minmaxScale.fit_transform(x) 
    y=df1[["Target"]].as_matrix()
    svmCLF2.fit(x[:-3], y[:-3].ravel())
    pred=svmCLF2.predict(x[-2].reshape(1,-1))
    df2=df2.append(df1.iloc[-2], ignore_index=True)
    df2.ix[len(df2)-1,'Pred']=pred
    print(k)

time.time()-t

We run the algorithmic trading strategy. It took 184 seconds to run the backtest. Python gives us the power to develop algorithmic trading strategies and backtest them before we deploy them in live trading. If the backtesting results of the algorithmic trading strategy are not good, it means that your algorithmic trading strategy will not work in live trading. You need to look deeply into your trading strategy and check why it is not working.

>>> #calculate the confusion matrix
... cnfMatrix=confusion_matrix(df2.Target, df2.Pred)
>>>
>>> print("the recall for this model is :",\
... cnfMatrix[1,1]/(cnfMatrix[1,1]+cnfMatrix[1,0]))
the recall for this model is : 0.0
>>> # no of True Positive
... print("TP",cnfMatrix[1,1,])
TP 0
>>> # no. of True Negative
... print("TN",cnfMatrix[0,0])
TN 3063
>>> # no of False Positive
... print("FP",cnfMatrix[0,1])
FP 0
>>> # no of False Negative
... print("FN",cnfMatrix[1,0])
FN 888
>>> #print accuracy of the model
... print(accuracy_score(df2.Target, df2.Pred))
0.7752467729688687
>>> sum(df2.Pred==1)
0
>>> sum(df2.Target==1)
888

Now if we look above we got an accuracy of 77%. But if you look more closely all the predictions are zero. Support Vector Machine Classifier failed to predict even 1 move above 60 pips. There were a total of 888 such moves but the SVM classifier simply failed. The SVM classifier is just predicting the majority class which is giving it the accuracy of 77%. It is simply unable to predict the price moves above 60 pips. We need to look deeply now why this is happening and if we can improve our classifier in anyway. Learn how to predict gold prices with kernel ridge regression.