When multi-party security computing meets quantitative investment research, can it accurately predict the market?
算力智库
2020-08-14 04:23
本文约3017字,阅读全文需要约12分钟
Quantitative investment relies on quantitative models and data to find a variety of "high probability" strategies that can bring excess returns.

When multi-party secure computing technology encounters quantitative investment research, how will investment research institutions fully tap the value of data? Guo Jia is specially invited for the computing power privacy data security column to explore the quantitative modeling of big data and the combination of multi-party secure computing technology and quantitative investment research work.

secondary title

Quantitative Logic of Big Data

We can simply divide investment research information into three categories according to the degree of data openness: public data, semi-public data, and non-public data.

  • Public data, well understood, is data that can be viewed at any time, such as stock prices, K-line charts, etc.;

  • Semi-public data refers to the data that we can obtain, but not comprehensively. For example, we can obtain the real-time capital flow at any time, but the website will not disclose the previous data;

  • Non-public data, that is, internal data of other companies and stock exchanges related to stocks in the market, cannot be provided externally.

Introduce a concept-quantitative investment. Quantitative investment is simply to find patterns in data. Big data has opened a new door for quantitative investment. The introduction of big data technology in quantitative trading can fully mine all the information hidden in massive data to predict financial and economic activities. Dynamically update the trading strategy to obtain the best forecasting effect.

All kinds of traditional quantitative indicators, whether based on price or financial data, will have a certain lag, and it is impossible to use more leading means to understand the industry and market. The industry and individual stock judgments that use big data technology can improve this situation to a certain extent. Use search factors to grasp investor sentiment, use e-commerce data to know the fundamental trends of various industries in real time, and use big V data to gather collective wisdom. These types of big data can theoretically be used to predict future market conditions. Introducing the big data of Internet finance into the model as a stock selection factor represents asset management institutions reconstructing the stock selection logic in index investment.

secondary title

Quantitative advantages of multi-party secure computing

In fact, valuable data is often lying in the arms of others. How to only carry out "spiritual sharing" without "physical contact" is the current compliance requirement for data application security. Privacy computing technology solves this problem of using numbers well. Multiple participants who hold their own private data jointly execute a calculation logic (such as calculating the maximum value) and obtain the calculation results. The private data information held by each party cannot be inferred from the messages sent by each party. Under this technology , the identity and status of each participant are the same, and a shared data strategy can be established. Since the data is not transferred, it will not leak user privacy or affect data specifications, in order to protect data privacy and meet legal and compliance requirements. The technical term is called multi-party secure computing.

secondary title

Avatar's Opening Ceremony

"This case is not to prove how good the third-party data is, but to demonstrate the investment research's data security modeling scheme."

  • Research target: All stocks in the GEM from August 5, 2019 to August 4, 2020

  • Research objective: The goal of this strategy is to use historical data to predict whether each stock has an increase of more than 8% on the day, that is, the stock increase of more than 8% on the day of the sample concentration, y value is 1, otherwise y value is 0.

  • research variable

  • Node A data in federated learning: Through stock historical data (public data), the week of the day, the average rate of return of the last three days, the average rate of return of the last seven days, the absolute rate of return of the last three days, the absolute rate of return of the last seven days, and the average rate of return of the last three days were constructed. Standard deviation, standard deviation of the last seven days, average turnover rate of the last three days, average turnover rate of the last seven days, average trading volume of the last three days, average trading volume of the last seven days, rising days in the past three days, rising days in the past seven days, rising days in the past three days There are 17 indicators including the number of times the stock has increased by more than 5%, the number of times the stock has increased by more than 5% in the past seven days, the number of times the index has fallen by more than 5% in the past three days, and the number of times the stock has fallen by more than 5% in the past seven days.

  • Node B data in federated learning: Based on the number of searches with the keyword "GEM" in Baidu search, the GEM search index of the day, the GEM search index of the last day, the GEM search index of the last three days, and the GEM search index of the last seven days were constructed. Index, the number of days when the GEM index rose in the past three days, the number of days when the GEM index rose in the last seven days, and the increase in the GEM search index, a total of 7 indicators, simulating external non-public data sources.

In summary, the strategy is constructed through the above 24 indicators, combined with the principle of momentum strategy and reversal strategy, and uses the number of Baidu searches as external data, as a variable that reflects market sentiment, and then filters input variables based on IV and other indicators to build a logic Regression model predicting whether the stock rose more than 8% for the day. In order to verify the role of Baidu Index, the strategy formulated four models as a comparison, as follows:

  • The sample set is all stocks, and the Baidu index is not used to build a model

  • The sample set is all stocks, and the Baidu index is used to build the model (other input variables are consistent with the control group 1)

  • The sample set is Huaxing Yuanchuang, and the Baidu index is not used to build the model

  • The sample set is Huaxing Yuanchuang, and the Baidu Index is used to build the model

(Remarks, because Baidu's search data is protected against crawlers, all information cannot be crawled, so in the process of modeling the full stock data, only the number of searches for "GEM" is used, and the corresponding stock of each stock is not used Name search times; only in the case where the sample set is Huaxing Yuanchuang, the number of searches for Huaxing Yuanchuang as a keyword is used.)

  • Model conclusion

1) From the IV value, it can be inferred that the Baidu index data has a more important role in predicting the y value. Among them, the growth and decline of the ChiNext search and the average value of the ChiNext search in the past three days are more significant. The model coefficient can be used to know the rise and fall of the ChiNext search It has a positive correlation with the average search value and y value of the GEM in the past three days, that is, the higher the value, the easier it is to increase by more than 8%. (See the figure below for specific information)

2) The sample set is all stocks, the Baidu index is used to build the model, and the AUC is 0.76, and the Baidu index is not used to build the model (other input variables are consistent with the control group 1), and the AUC is 0.72, indicating that the prediction of the Baidu index has been significantly improved Effect. (See the figure below for specific information)

3) The sample set is Huaxing Yuanchuang, which uses the Baidu index to build the model, and the AUC is 0.74. The Baidu index is not used to build the model (other input variables are consistent with the control group 1), and the AUC is 0.73, indicating that the prediction of the Baidu index has improved Effect. (See the figure below for specific information)

According to the above cases, we found that adding external non-public information can indeed improve the stock forecasting ability.

For quantitative investment, most of the time of traditional quantitative investment is wasted on data cleaning and data sorting, and the data obtained from external sources, due to unclear data sources, has major hidden dangers in data quality and data security. Quantitative strategies may be due to data Quality (data update is not timely, data acquisition method is illegal) and has a negative impact (violation of personal privacy, quantitative strategy is not robust due to missing data).

outlook

outlook

author

author

Currently working as a senior director of Fushu Technology, responsible for privacy computing solutions and business implementation.

An elementary school student who prides himself on moving from technology to business. With nearly ten years of experience in the Internet big data industry, he has successively worked in Shanghai Dazhizhi, Ping An, and Micai, as a big data architect and senior analyst, and has in-depth research on financial technology.

Currently working as a senior director of Fushu Technology, responsible for privacy computing solutions and business implementation.

算力智库
作者文库