The next generation technology trend of federated learning: no third party and componentization
算力智库
2020-09-23 03:05
本文约2602字,阅读全文需要约10分钟
Federated learning is an emerging basic technology of artificial intelligence. It was first proposed by Google in 2016. It is a technology that trains algorithms on multiple decentralized edge devices or servers without exchanging data samples. Federated

Recently, Professor Yang Qiang pointed out in the SDBD2020 Computing Power Online Forum that federated learning is not only a technology, but also the establishment of an open source ecosystem. As a bridge to open data islands, federated learning enables artificial intelligence systems to use their own data more efficiently and accurately while meeting data privacy, security, and regulatory requirements.

Straight to the point, Guo Jia is specially invited for the computing power privacy data security column in this issue, asking three big questions, and boldly envisioning the future of federated learning technology without third parties and componentization.

secondary title

three questions

1) Isn't the third-party coordinator of federated learning criticized?

2) If you enter an institution like a bank, if it is a product that is ordered, how can you convince the leadership?

3) Does the fact that an organization purchases a modeling platform or even a modeling solution mean that its own engineers are useless?

secondary title

No third party is "neutral"

Most of the privacy computing products currently on the market have a third party, or one of them serves as the third party.

The third party C assigns the same public key to the calculation (data) parties A and B at the very beginning, and the third party C holds the corresponding private key; party A and party B respectively make corresponding calculations on their own data, which will be based on their own data The intermediate result is encrypted with the public key, and the ciphertext is sent to the third party C; the third party C uses the private key to decrypt the ciphertext of the intermediate result from all parties to obtain the plaintext of the intermediate result of each party, and summarize the complete intermediate result. This process will iterate repeatedly until the end condition is met.

Judging from the final result, a third party C obtained information that he should not have obtained during the participation process. Whether this information exposes the private data of other participants is related to the structure of the running algorithm itself: for example, in the logistic regression algorithm, the participant calculates the inner product based on its own data, and encrypts it and sends it to the coordinator, and the coordinator gets The inner product result cannot get the original data, because the inner product data covers the original data well.

However, not all algorithms have intermediate results that behave like logistic regression problems—without exposing the original data of the parties involved.

The coordination of the third party has indeed brought great convenience to the design and implementation of the protocol. But in fact, the establishment of the third party itself requires a relatively large cost. As for the so-called concurrent third party, although only two parties participated in the whole process, it did not actually solve the problems in the solution with a third party—the information of one of them is still in danger of being exposed.

Compared with the scheme with third-party participation, the scheme based on MPC without third-party participation abandons the coordinator from the beginning of the protocol design.

In theory, the MPC protocol requires that the participating parties cannot obtain additional information except their own input and output of their own calculation results. MPC only has information interaction between the participants, and the interaction process during the execution of the protocol will not leak the private data of the participants. This is guaranteed by the underlying protocol design. For example, in an obfuscated circuit, participants map their respective inputs into random keys, and use an obfuscated truth table to maintain the relationship between the participants' inputs and outputs. All parties interact only with the ciphertext corresponding to the truth table, without any input and output data.

In general, the scheme based on MPC without third-party participation is more thorough in protecting the privacy data of the participants, while the scheme with third-party participation needs to rely on the credibility of the third party or the structure of intermediate results to ensure data privacy sex. That is to say, when the two parties cannot establish a trusted third party, the scheme based on MPC without third party participation will be the first choice for security.

Fushu's upcoming Avatar 2.0 major version, based on the core idea of ​​MPC, starts from the calculation of the underlying basic operators and simple functions, and finally completes product upgrades without third parties.

secondary title

The Dos to Windows Revolution: IME

I have been paying attention to domestic competing products. At present, a very important reason why many manufacturers are entangled by customers is the product interaction experience! I believe that any bank-level customer must choose to use Windows between Dos and Windows. (Because leaders definitely like things that can be seen more clearly). A large number of federated learning platforms that can be achieved at present are still at the "Dos" level, and the slightly better ones are equivalent to "Windows 95".

For the integrated environment, we prefer an encapsulated and more componentized IDE, where the process is the business. I want to redefine the concept of a federated learning modeling platform, integrated modeling environment IME (Integrated modeling environment).

Componentization is the biggest feature of IME. I like to use a few words to introduce it to customers: transparent process, low coupling between upper and lower levels, and scalable algorithms. The federated learning platform has very important components, such as encryption components, decryption components, communication components, algorithm components, and so on. Let federated learning platform customers focus on algorithms, which are scalable and compilable.

Independence: Our projects, applications, and products can actually be split into multiple components, each component is responsible for a function, and each component exists in an independent form.

Complete: Each component has complete local functions, and the components are independent of each other.

Free combination: By combining components freely, complete products, applications and projects can be formed.

In the face of changes in internal personnel, business expansion, and changes in user needs, componentization can not only meet the huge development needs, but also realize the intercommunication between developers, users, application components, and information resources. Perhaps, componentization is the key to the production line. The only way.

Componentization lowers the threshold for business modeling engineers who do not have programming skills to a certain extent. On the basis of componentization, users can simply drag and drop various components on the workbench to achieve flexible combination and connection, making the personalized modeling process possible.

In the face of a variety of user needs, a single component or partial component architecture is often useless and cannot really solve the core of the problem, so how to build the entire product through the combination of standards and specifications, and finally present users with a large number of basic modules and each A product with independent, complete and freely combinable modules is the key core of component development.

your answer

Written at the end, answer the three questions at the beginning of the article:

1) Isn't the third-party coordinator of federated learning criticized?

Answer: If a customer like a bank is aware of a third party, he will definitely ask whether the third party can be avoided. Therefore, Fushu Avatar took several months to overcome and realized federated learning without third parties.

2) If you enter an institution like a bank, if it is a product that is ordered, how can you convince the leadership?

Answer: It's hard to convince. Federated learning is inherently difficult to explain clearly. With graphics, it must be easy to say things.

3) Does the fact that an organization purchases a modeling platform or even a modeling solution mean that its own engineers are useless?

Answer: Let partners really participate in the construction of the federated learning platform. It is more appropriate to open up the algorithm components that are closer to the business. Your engineers can write algorithms suitable for their own business according to the protocol standards. Can you understand the truth?

An elementary school student who prides himself on moving from technology to business. With nearly ten years of experience in the Internet big data industry, he has successively worked in Shanghai Dazhizhi, Ping An, and Micai, as a big data architect and senior analyst, and has in-depth research on financial technology.

Currently working as a senior director of Fushu Technology, responsible for privacy computing solutions and business implementation.

Guo Jia

If you have unique views on data privacy, data security sharing technology and other fields, welcome to contribute!

Submission email: hehaohua3h@163.com

算力智库
作者文库