In this post we will use Support Vector Machines (SVMs) in developing our algorithmic trading strategy. Did you read my previous post on how to use indicators in expert advisors. We develop two Support Vector Machines (SVMs). One predicts the volatility and second predicts the market direction. We will use a trick to transform the data and make the predictions. You may like the trick that I use to predict price after 10, 20, 30 hours. If you are interested to know the trick, continue reading the post.
More and more traders are getting interested in algorithmic trading. If you want to become an algorithmic trader, you should learn Python. Python is a modern object oriented language that can be used for many things but in recent years has become the most popular programming language for artificial intelligence, machine learning, deep learning as well as reinforcement learning. You can take a look at the courses that I have developed for teaching python to traders.
The first course is the Python for Traders. It teaches you how to start programming your algorithmic trading strategies in Python. No previous experience or knowledge of Python programming is required. The second course is Python Machine Learning for Traders. Once you master machine learning with Python, you can take the next course Python Deep Learning for Trader. In the last course Algorithmic Trading with Python, I teach you how to connect Python with MT4 so that you can use its powerful machine learning and deep learning libraries in developing your EA.
I want to introduce Python to you in this post. We will start with the basics and then do some advanced stuff using Pandas. Pandas is a very powerful Python library that get used on daily basis at Wall Street and in other quant funds.
>>> import datetime >>> import math >>> import time >>> import numpy as np >>> import pandas as pd >>> >>> t=time.time() >>> df = pd.read_csv(\ ... 'D:/Shared/MarketData/EURUSD60.csv', header=None) >>> df.columns=['Date', 'Time', 'Open', 'High', 'Low',\ ... 'Close', 'Volume'] >>> df.tail() Date Time Open High Low Close Volume 7753 2018.11.09 17:00 1.13240 1.13374 1.13168 1.13317 2800 7754 2018.11.09 18:00 1.13312 1.13385 1.13265 1.13307 1718 7755 2018.11.09 19:00 1.13305 1.13388 1.13296 1.13352 1588 7756 2018.11.09 20:00 1.13353 1.13374 1.13294 1.13369 1274 7757 2018.11.09 21:00 1.13359 1.13392 1.13324 1.13350 660 >>> #reverse the dataframe ... #df2=df[::-1] ... >>> #create an empty dataframe ... df1=pd.DataFrame() >>> df1.tail() Empty DataFrame Columns: [] Index: [] >>> #define the candle timeframe ... n =40 >>> k=len(df)-2-math.floor((len(df)-2)/n)*n >>> df1["Date"]=df.Date[k:len(df)-2:n].values >>> df1["Time"]=df.Time[k:len(df)-2:n].values >>> df1["Open"]=df.Open[k:len(df)-2:n].values >>> df1["High"] =\ ... df.High.rolling(n).max()[k+n:len(df)-1:n].values >>> df1["Low"] =\ ... df.Low.rolling(n).min()[k+n:len(df)-1:n].values >>> df1["Close"]=df.Close[k+n:len(df)-1:n].values >>> df1["Volume"] =\ ... df.Volume.rolling(n).sum()[k+n:len(df)-1:n].values >>> df1.tail() Date Time Open High Low Close Volume 188 2018.10.30 11:00 1.13570 1.13810 1.13016 1.13458 66344.0 189 2018.11.01 03:00 1.13404 1.14552 1.13371 1.13946 78165.0 190 2018.11.02 19:00 1.13848 1.14250 1.13528 1.14176 58387.0 191 2018.11.06 12:00 1.14065 1.14991 1.13942 1.14243 81837.0 192 2018.11.08 04:00 1.14284 1.14459 1.13168 1.13369 86844.0 >>> df1=df1.append(df.iloc[len(df)-1], ignore_index=True) >>> df1.tail() Date Time Open High Low Close Volume 189 2018.11.01 03:00 1.13404 1.14552 1.13371 1.13946 78165.0 190 2018.11.02 19:00 1.13848 1.14250 1.13528 1.14176 58387.0 191 2018.11.06 12:00 1.14065 1.14991 1.13942 1.14243 81837.0 192 2018.11.08 04:00 1.14284 1.14459 1.13168 1.13369 86844.0 193 2018.11.09 21:00 1.13359 1.13392 1.13324 1.13350 660.0 >>> time.time()-t 0.12879610061645508
As you can see Pandas is pretty fast. I timed the code execution and Pandas did the whole job blazingly fast in just 0.12 second. Pandas is a highly optimized Python library. It uses Numpy underneath which is Python library for scientific computations. Numpy has been written in C language which is a very fast language. Numpy provides the framework on which Pandas has been written. So you can see above it did the whole conversion of the dataframe is in less than 0.1 seconds. Sometimes I also get 0.07 seconds. Now let’s convert the above code into a function that we can use wheneve we want.
>>> def newOHLCV(df, n): ... ''' ... This function creates a new OHLCV dataframe with ... a different time interval for the purpose of ... drawing new candlesticks. ... Params: ... df is the original dataframe ... n is the new time interval ... ''' ... #create an empty dataframe ... df1=pd.DataFrame() ... df1.tail() ... #define the candle timeframe ... k=len(df)-2-math.floor((len(df)-2)/n)*n ... df1["Date"]=df.Date[k:len(df)-2:n].values ... df1["Time"]=df.Time[k:len(df)-2:n].values ... df1["Open"]=df.Open[k:len(df)-2:n].values ... df1["High"] =\ ... df.High.rolling(n).max()[k+n:len(df)-1:n].values ... df1["Low"] =\ ... df.Low.rolling(n).min()[k+n:len(df)-1:n].values ... df1["Close"]=df.Close[k+n:len(df)-1:n].values ... df1["Volume"] =\ ... df.Volume.rolling(n).sum()[k+n:len(df)-1:n].values ... df1=df1.append(df.iloc[len(df)-1], ignore_index=True) ... return df1 ... >>> df2=newOHLCV(df,40) >>> df2.tail() Date Time Open High Low Close Volume 189 2018.11.01 03:00 1.13404 1.14552 1.13371 1.13946 78165.0 190 2018.11.02 19:00 1.13848 1.14250 1.13528 1.14176 58387.0 191 2018.11.06 12:00 1.14065 1.14991 1.13942 1.14243 81837.0 192 2018.11.08 04:00 1.14284 1.14459 1.13168 1.13369 86844.0 193 2018.11.09 21:00 1.13359 1.13392 1.13324 1.13350 660.0
I have been developing algorithmic trading strategies and I find the above code highly useful. It allows me to change the timeframe of the candlestick chart at any time point. As you can see above I didn’t use a for loop. For loops are inherently slow. They took more time in execution as compared to above vectorized operations. You should have the ability to look at the candlestick charts with a different zoom lens. The above code provides you with that zoom lens. Take a look at this Neural Network Forex Trading System.
Let’s start.
import datetime import time import math import numpy as np import pandas as pd import talib as ta import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline from sklearn import preprocessing from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix from sklearn.svm import SVC from sklearn.ensemble import RandomForestClassifier\ as make_forest from sklearn.metrics import accuracy_score from sklearn.metrics import roc_auc_score from sklearn.preprocessing import normalize import xgboost as xgb
After reading the libraries we define two functions that are crucial in our algorithmic trading strategy. First function we have already defined the second function we define below:
#define the features def TA1(df): #calculate RSI 14 df["rsi"]=ta.RSI(df.Close, 14) #calculate MACD df["macd"], df["macdSignal"],\ df["macdHist"] = ta.MACD(df.Close,\ fastperiod=12, slowperiod=26, signalperiod=9) #calculate Williams %R df["williamsR"] = ta.WILLR(df.High, df.Low,\ df.Close, timeperiod=14) #calculate Wilder's DMI better known as ADX df["adx"] = ta.ADX(df.High, df.Low, df.Close, timeperiod=14) # calculate the difference from MA df["priceMAdiff"] = df.Close-ta.MA(df.Close, timeperiod=25) #df['EMA'] = ta.EMA(df.Close, timeperiod=21) df["Pips"]=10000*(df.Close.shift(-1)-df.Close) df.loc[df.Pips.abs() > 60, "Target"]=1 df.loc[df.Pips.abs() > 60, "Target"]=0 #df.loc[df.Pips < -70, "Target"]=-1 #df.loc[df.Pips > 70, "Target"]=1 #df.loc[df.Pips < -70, "Target"]=-1 #df.loc[(df.Pips > -70) & (df.Pips < 70), "Target"]=0 df["rsi1"]=df.rsi.shift(1) df["rsi2"]=df.rsi.shift(2) df["rsi3"]=df.rsi.shift(3) df["macd1"]=df.macd.shift(1) df["macd2"]=df.macd.shift(2) df["macd3"]=df.macd.shift(3) df["williamsR1"]=df.williamsR.shift(1) df["williamsR2"]=df.williamsR.shift(2) df["williamsR3"]=df.williamsR.shift(3) df["adx1"]=df.adx.shift(1) df["adx2"]=df.adx.shift(2) df["adx3"]=df.adx.shift(3) df["priceMAdiff1"]=df.priceMAdiff.shift(1) df["priceMAdiff2"]=df.priceMAdiff.shift(2) df["priceMAdiff3"]=df.priceMAdiff.shift(3) df=df.dropna() return df
Above we have defined the Technical Analysis function. We will be using different technical indicators like RSI, MACD, Williams %R and ADX. I want to check whether we can use these technical indicators in developing an algorithmic trading strategy. These technical indicators are usually visually interpreted. A machine learning algorithm should be able to read the visual patterns.
#select the features for training #define the Support Vector Machine prediction function #create an empty dataframe svmCLF2 = SVC(gamma='auto') SVC(C=1, cache_size=200, class_weight='balanced', coef0=0.0,decision_function_shape='ovr',\ degree=3, gamma='auto', kernel='rbf',\ max_iter=-1, probability=False,\ random_state=None, shrinking=True,\ tol=0.001, verbose=False) minmaxScale=preprocessing.MinMaxScaler(\ feature_range=(0,1))
IN the above we defined the Support Vector Machine classifier. Take a look at this High Frequency Trading Algorithmic Trading System.
#select the features for training #define the Random Forest prediction function #create an empty dataframe df2=pd.DataFrame() ndf=len(df) n=30 t=time.time() for k in range(n*300, ndf-2): df1=newOHLCV(df.iloc[k-n*300:k], n) df1=TA1(df1) x= df1[["rsi", "rsi1", "rsi2", "rsi3",\ "macd", "macd1", "macd2", "macd3", "williamsR",\ "williamsR1", 'williamsR2', "williamsR3",\ "adx", "adx1", "adx2", "adx3",\ "priceMAdiff", "priceMAdiff1", "priceMAdiff2",\ "priceMAdiff3"]].as_matrix() x=minmaxScale.fit_transform(x) y=df1[["Target"]].as_matrix() svmCLF2.fit(x[:-3], y[:-3].ravel()) pred=svmCLF2.predict(x[-2].reshape(1,-1)) df2=df2.append(df1.iloc[-2], ignore_index=True) df2.ix[len(df2)-1,'Pred']=pred print(k) time.time()-t
We run the algorithmic trading strategy. It took 184 seconds to run the backtest. Python gives us the power to develop algorithmic trading strategies and backtest them before we deploy them in live trading. If the backtesting results of the algorithmic trading strategy are not good, it means that your algorithmic trading strategy will not work in live trading. You need to look deeply into your trading strategy and check why it is not working.
>>> #calculate the confusion matrix ... cnfMatrix=confusion_matrix(df2.Target, df2.Pred) >>> >>> print("the recall for this model is :",\ ... cnfMatrix[1,1]/(cnfMatrix[1,1]+cnfMatrix[1,0])) the recall for this model is : 0.0 >>> # no of True Positive ... print("TP",cnfMatrix[1,1,]) TP 0 >>> # no. of True Negative ... print("TN",cnfMatrix[0,0]) TN 3063 >>> # no of False Positive ... print("FP",cnfMatrix[0,1]) FP 0 >>> # no of False Negative ... print("FN",cnfMatrix[1,0]) FN 888 >>> #print accuracy of the model ... print(accuracy_score(df2.Target, df2.Pred)) 0.7752467729688687 >>> sum(df2.Pred==1) 0 >>> sum(df2.Target==1) 888
Now if we look above we got an accuracy of 77%. But if you look more closely all the predictions are zero. Support Vector Machine Classifier failed to predict even 1 move above 60 pips. There were a total of 888 such moves but the SVM classifier simply failed. The SVM classifier is just predicting the majority class which is giving it the accuracy of 77%. It is simply unable to predict the price moves above 60 pips. We need to look deeply now why this is happening and if we can improve our classifier in anyway. Learn how to predict gold prices with kernel ridge regression.