传统量化,其实是把自己脑子里的投资逻辑变成规则和策略,用量化的语言表达出来。
传统量化过程:
-
数据准备
-
计算策略需要的技术指标
-
开发策略逻辑
-
回测
而AI量化,是给定“输入”与输出,然后机器从数据中学习出“规则”,好处是不用人为建模,我们做好特征工程,挖掘因子就好。坏处是“可解释性”可能不好。
AI量化过程:
-
数据准备
-
特征工程(也可以是技术指标,但一般数量会很多)
-
数据划分(训练集、测试集)
-
建立算法模型
-
对训练集进行训练,预测;对测试集进行训练,预测
-
评估模型性能,改进
-
回测
我们今天使用深度学习模型对指数进行“预测”和量化回测。
01 数据准备
import os
import math
import numpy as np
import pandas as pd
from pylab import plt, mpl
plt.style.use('seaborn') mpl.rcParams['savefig.dpi'] = 300 mpl.rcParams['font.family'] = 'SimHei' pd.set_option('mode.chained_assignment', None) pd.set_option('display.float_format', '{:.4f}'.format) np.set_printoptions(suppress=True, precision=4) os.environ['PYTHONHASHSEED'] = '0'
导入指数:
def read_data(symbol): data = pd.DataFrame(pd.read_csv('data/{}.csv'.format(symbol)).dropna()) data['date'] = data['date'].apply(lambda x:str(x)) data.set_index('date',inplace=True) data.sort_index(ascending=True, inplace=True) data = data[['close']] return data data = read_data('SPX') data.head() url = 'http://hilpisch.com/aiif_eikon_eod_data.csv' symbol = 'EUR=' #data = pd.DataFrame(pd.read_csv(url, index_col=0, # parse_dates=True).dropna()[symbol]) #data.rename(columns={'EUR=':'close'},inplace=True) data
02 特征工程
添加均线,最小值,最大值,收益动量,收益标准差。
外加5天的滞后周期,
def add_lags(data, symbol, lags, window=20): cols = [] df = data.copy() df.dropna(inplace=True) df['r'] = np.log(df['close'] / df['close'].shift(1)) df['r_'] = df['close'].pct_change() df['sma'] = df[symbol].rolling(window).mean() df['min'] = df[symbol].rolling(window).min() df['max'] = df[symbol].rolling(window).max() df['mom'] = df['r'].rolling(window).mean() df['vol'] = df['r'].rolling(window).std() df.dropna(inplace=True) df['d'] = np.where(df['r'] > 0, 1, 0) features = [symbol, 'r', 'd', 'sma', 'min', 'max', 'mom', 'vol'] for f in features: for lag in range(1, lags + 1): col = f'{f}_lag_{lag}' df[col] = df[f].shift(lag) cols.append(col) df.dropna(inplace=True) return df, cols data, cols = add_lags(data, 'close', lags=5, window=20) data
按日期划分训练集与测试集
split = '2018-01-01' train = data.loc[:split].copy() test = data.loc[split:].copy() def norm(raw): mu, std = raw.mean(), raw.std() data_ = (raw-mu)/std return data_ mu, std = train.mean(), train.std() train_ = (train-mu)/std test_ = norm(test) print(train) train_
03 算法模型
import random import tensorflow as tf from keras.layers import Dense, Dropout from keras.models import Sequential from keras.regularizers import l1 from keras.optimizers import Adam from sklearn.metrics import accuracy_score def set_seeds(seed=100): random.seed(seed) np.random.seed(seed) tf.random.set_seed(seed) set_seeds() optimizer = Adam(learning_rate=0.0001) def create_model(hl=2, hu=128, dropout=False, rate=0.3, regularize=False, reg=l1(0.0005), optimizer=optimizer, input_dim=len(cols)): if not regularize: reg = None model = Sequential() model.add(Dense(hu, input_dim=input_dim, activity_regularizer=reg, activation='relu')) if dropout: model.add(Dropout(rate, seed=100)) for _ in range(hl): model.add(Dense(hu, activation='relu', activity_regularizer=reg)) if dropout: model.add(Dropout(rate, seed=100)) model.add(Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) return model
模型训练耗时3秒左右,样本内的准确率57%,样本外51.9%。
train['p'] = np.where(model.predict(train_[cols]) > 0.5, 1, 0) print(train['p']) def backtest(data, data_norm): data['pos'] = np.where(model.predict(data_norm[cols]) > 0.5, 1, 0) data['pos']=np.where(data['pos'] == 1, 1, -1) data['收益率_对数'] = data['pos'] * data['r'] data['收益率'] = data['pos'] * data['r_'] #data_bkt['收益率'] = data_bkt['pos'] * data_bkt['r_'] data['equity_基准'] = data['r'].cumsum().apply(np.exp) #data_bkt['equity_策略'] = (data_bkt['收益率']+1).cumprod() data['equity_策略_对数'] = data['收益率_对数'].cumsum().apply(np.exp) data['equity_策略'] = (data['收益率']+1).cumprod() data[['equity_基准','equity_策略_对数','equity_策略']].plot(figsize=(10,6)) backtest(train,train_)
训练集上回测结果:
测试集数据回测:
小结
我们仅使用了几个简单的因子“特征”,并建立简单的DNN模型,预测下一期的收益,可以获得超额收益。
未来的改进空间
-更好的因子,更好的模型。
涉及的数据及代码,可以在星球里下载。
发布者:股市刺客,转载请注明出处:https://www.95sca.cn/archives/104215
站内所有文章皆来自网络转载或读者投稿,请勿用于商业用途。如有侵权、不妥之处,请联系站长并出示版权证明以便删除。敬请谅解!