
Editor's Note: This article comes fromAmbi Labs (ID: secbitlabs), reprinted by Odaily with authorization.
Editor's Note: This article comes from
Ambi Labs (ID: secbitlabs)
, reprinted by Odaily with authorization.
A while ago, I took the CS355 (advanced cryptography) course at Stanford. We were taught by three of Dan's PhD students, Dima Kogan, Florian Tramer, and Saba Eskandarian. The three PhD lecturers have their own characteristics, and their research directions are also very different. Dima focuses on privacy protection technology PIR (Private Information Retrieval), Florian focuses on ML and blockchain research, and Saba focuses on zero-knowledge proof.
CS355 This course covers almost everything in the field of cryptography from ancient times to the present. From the earliest one-way function (One-way Function), to elliptic equation (ECC) and Pairing, and finally to some popular MPC, zero-knowledge, private information retrieval (PIR), fully homomorphic algorithm and so on in recent years. Since the class just ended two days ago, I organized a wave of notes while the knowledge content was still in the shallow memory. The content of the course is very interesting, and the rest of the content will be shared with you when I have time~
In this issue, we will talk about the recently popular Fully Homomorphic Encryption (FHE) and the accompanying Lattice-based Encryption technology.
secondary title
What is fully homomorphic encryption
I believe that many friends have heard of the name of Fully Homomorphic Encryption (FHE). In recent years, there have been more and more topics about personal privacy protection, and a series of cryptographic application technologies including homomorphic encryption have also been widely popularized.
In order to better understand this topic, I would like to briefly introduce the definition of fully homomorphic encryption.
Encryption System Review
Before we start, let's review the most traditional encryption systems.
We all know that if we want to build an encryption system, we often need a key (Key). Through this key, we can encrypt plaintext information into ciphertext at one end, and then change the ciphertext back to its original form through the key at the other end. Without this Key, it is difficult for other people to know what information we have passed.
The illustration above basically shows the whole picture of all common encryption systems. All encryption systems, if described in a more formal way, undoubtedly do three things:
First, a pair of keys for encryption and decryption is randomly generated by a generation algorithm.
The encryption party encrypts the original text through the encryption key and encryption algorithm, and finally obtains the ciphertext.
Then, when decrypting, the decrypting party can decrypt the ciphertext through the decryption key and decryption algorithm, and finally restore the original original text.
Friends who know about encryption algorithms will definitely know some common encryption algorithms, such as AES, RSA and so on. Everyone will also know that encryption systems are generally divided into two types: symmetric encryption systems and asymmetric encryption systems. The three steps we describe here are applicable to any encryption system. If it is symmetric, then the key used for encryption and decryption is the same. If it is an asymmetric system, the encryption uses the public key (Public Key), and then the decryption uses a different private key (Private Key).
In cryptography research, whenever we see the definition of a new system, we often have to state the properties (Properties) that this system should have.
First, our first property to implement is Correctness. Correctness means that if I have a correct key, then I can restore the ciphertext to the original text through the decryption algorithm. We often use probability methods to express the success rate of decryption:
The above equation represents that if we have the correct key, the probability that the decryption algorithm can restore the ciphertext generated by the encryption algorithm is 100%.
The second property we want to achieve is Semantic Security.
For the definition of semantic security, we will not explain it here. But we can understand that if we have any two ciphertexts corresponding to different original texts, then we cannot distinguish which ciphertext corresponds to which original text:
The main significance of semantic security is that bystanders cannot distinguish between two encrypted messages. So if an intruder eavesdrops on the network and sees the encrypted information I sent, as long as the encryption system I use is semantically secure, then I can be sure that the intruder cannot get any information about the encrypted content from the ciphertext.
After satisfying the above two properties, an encryption system becomes sound.
Homomorphic encryption: an accidental special property
After understanding what the encryption system is all about, we can pay attention to this ciphertext that seems to be randomly generated garbled characters. We all know that we won't get any information just by looking at the ciphertext itself. But does this mean that if there is no key and only ciphertext, we can do nothing?
We all know the answer: not really.
But among many encryption algorithms, the ciphertext generated by a class of algorithms has a special homomorphic property: if we use the encryption algorithm to get the ciphertext of the number 1, then we get the ciphertext of the number 2. At this time, if we add up the ciphertext, it is exactly the ciphertext!
For this property, we call additive homomorphisms. That is to say, the encrypted ciphertext maintains a subtle algebraic relationship with the previous original text. If we add up the ciphertexts, we can get the new ciphertext after adding up the original texts and encrypting them. In the same way, besides additive homomorphism, there is also an encryption algorithm with multiplicative homomorphism. We can multiply the ciphertext generated by a multiplicative homomorphic algorithm, and then obtain the ciphertext corresponding to the result of the multiplication between the original texts. In the whole process, we don't need to know any information related to the key, we just combine the ciphertext that looks like random garbled codes.
Example: Anonymous Voting System
Next, let's give an example to vividly describe the practicality of homomorphic encryption.
We all know what online voting is like-if the boss of a company wants to initiate a wave of voting, choose whether the company should adopt the 996 system. Then the boss can ask IT to set up a server and let employees submit their own choices: vote 0 for no, vote 1 for yes. After the final voting period, the boss can add all the votes together. If the sum of all votes in the end exceeds half of the number of employees, the company will start 996.
This voting mechanism seems very simple, but there is a big privacy problem: If the boss wants all employees to be 996 in his heart, but he just deliberately initiated this voting to phishing law enforcement to see which employees are unwilling to work overtime, then the boss can quietly I entrust my younger brother to eavesdrop on the Internet, record the choices submitted by each employee, and finally see who voted 0. As a result, employees are very afraid and dare not confide in their hearts.
If we can use an additive homomorphic encryption algorithm, then this problem can be easily solved.
First, IT can use the KeyGen algorithm to generate a pair of encryption and decryption keys pk, sk, and then distribute the public key to each employee.
Subsequently, each employee encrypts his choice (1 or 0) into ciphertext through an encryption algorithm, and then sends the ciphertext to the IT server:
Finally, the IT department can add up everyone's choices, get a combined ciphertext, and send this ciphertext to the boss. Finally, the boss uses the decrypted key to open the ciphertext and get the final voting result:
In this way, we have successfully realized the vote counting system with the feature of additive homomorphism, and even if the boss sends someone to monitor the network, he can only see the encrypted ciphertext cti, and there is no way to know what vote each person voted for. .
Of course, this system is still very incomplete. For example, the IT department can directly help the boss decrypt everyone's vote and record it into a document. There are other cryptographic tools that can help us solve this problem. Due to space reasons, I will not elaborate here.
But here, we should be able to feel the power of the homomorphic encryption algorithm. We can understand it this way: through the homomorphic encryption algorithm, users can perform some kind of Secure Delegated Computation (Secure Delegated Computation) with an untrusted remote server (cloud). Users can entrust their sensitive privacy input to the cloud through homomorphic encryption technology, and then the cloud can perform a certain degree of calculation on the encrypted data, and finally get the encrypted result that the user wants, and return it to user. Finally, the user can use the decryption key to open the obtained result. Throughout the protocol, the principal (cloud) cannot see any useful information related to private input.
Classification of homomorphic encryption systems
After a general understanding of the two most basic homomorphic properties, other concepts become very easy to understand. From a systematic point of view, homomorphic encryption systems are roughly divided into four categories: partial homomorphism, approximate homomorphism, finite series full homomorphism, and complete homomorphism.
Next, let's take a look at the specific definitions of each category in turn.
Partially Homomorphic Encryption
The initial stage of homomorphic encryption is called partial homomorphic encryption. The definition is that the ciphertext has only one homomorphic characteristic. This stage includes the two modes of additive homomorphism and multiplicative homomorphism we described above.
If we put it into the scenario of secure proxy computing mentioned earlier, if we have private input, and then we hope that the cloud can help us calculate (x1, x2,...), then we can use the cloud to calculate these inputs. function to represent.
If we can calculate it through an additive homomorphic encryption algorithm, it means that this function must only contain any linear combination of private inputs (addition operations). A working example would be to multiply the private inputs by a constant and add them together:
But if we want to multiply two inputs together, then the above-mentioned additive homomorphic algorithm is helpless. can only be a linear combination of all private inputs.
The common additive homomorphic encryption algorithm is the ElGamal encryption algorithm based on cyclic group.
ElGamal is a very convenient public key encryption algorithm based on the Diffie-Hellman key exchange protocol, which uses the characteristics of cyclic groups. Due to space reasons, we will not explain the definition of cyclic groups in detail here, we only need to know that each group can find a generator
This generator can then be exponentiated. The implementation method of ElGamal encryption is roughly as follows:
At first, we
We will not prove the correctness and safety of ElGamal. But after seeing the encryption method of this encryption system, we can find the potential additive homomorphic characteristics of ElGamal because they are all power operations.
The same principle can also be applied to the algorithm of multiplicative homomorphic encryption - we can only multiply all the private inputs together, but there is no way to do any linear combination (addition). Examples of multiplicative homomorphism are actually very common. The RSA encryption that we are all familiar with is a multiplicative homomorphic system.
The implementation method of RSA encryption is roughly as follows:
We get the ciphertext corresponding to the multiplication of these two messages! This is the homomorphic nature of multiplication. We can continue to superimpose new ciphertexts on top of this ciphertext, so that we can get any multiplication between ciphertexts.
Somewhat Homomorphic Encryption
As we said in the previous paragraph, if we want to multiply the private inputs and obtain a linear combination between them, a simple partial homomorphic encryption algorithm (RSA, ElGamal) cannot be completed. So we need to go to the next stage.
The next stage of partially homomorphic encryption is approximately homomorphic encryption, which is one step closer to the fully homomorphic we want to achieve. If we have an approximate homomorphic encryption algorithm, then we can simultaneously calculate addition and multiplication on the ciphertext. But it should be noted that because this stage is approximately homomorphic (Somewhat Homomorphic), the number of additions and multiplications that can be done is very limited, and the functions that can be calculated are also within a limited range.
There are not many common examples at this stage of approximate homomorphic encryption. If we talk about the best-understood example, it should be based on pairing (Pairing) cyclic group encryption algorithm.
Above we briefly discussed the ElGamal encryption algorithm based on ordinary cyclic groups. We all know that this algorithm is additively homomorphic, which means that any linear combination between ciphertexts can be obtained. In fact, in some special cyclic groups, we can find some weak multiplicative homomorphic properties.
In this way, we get the product combination between the powers of the first two values in disguise! Combined with the property that the power of two values can be linearly combined in ElGamal encryption, then we get a fully homomorphic system.
In fact, the reality is not so beautiful, because the special property of Pairing does not appear in all cyclic groups. So if we multiply the values in two groups that can be paired by pairing and map them to a new group, then the new group may not be able to find the corresponding pairing attribute!
That is to say, with cyclic groups that have the Pairing property, we can only do very limited multiplications. If our current group supports pairing, but the new mapping group GT does not support any pairing, it means that if we want to perform homomorphic encryption operations based on the current system, although the functions that can be operated can include any linear combination, But it can only contain up to one layer of multiplication.
This limitation proves that the system is approximately homomorphic, since we cannot compute a function F of arbitrary logic and depth.
Fully Leveled Homomorphic Encryption
After reaching the next stage, we are one step closer to the goal of full homomorphism.
This stage is called finite series fully homomorphic encryption. At this stage, we can already perform any combination of addition and multiplication on the ciphertext, without any limitation on the number of times.
But the reason why it is called finite series homomorphism is that the algorithm at this stage will introduce a new concept of complexity upper limit L, which constrains the complexity of function F. If we can express it in binary circuit C, then the depth and size must be within the range of L, namely:
In other words, if L is relatively large, then we can perform various simpler (low-complexity) homomorphic operations. The algorithm of finite series homomorphic encryption is also the cornerstone of the next stage of fully homomorphic encryption. When we realize the fully homomorphic within the complexity, the realization of fully homomorphic is not far away.
We can understand how the upper limit L of this complexity comes from. First of all, we can imagine that if there is a fully homomorphic encryption algorithm, any addition and multiplication operation can be performed on the ciphertext. However, this algorithm will add a certain amount of random noise to the ciphertext when encrypting.
When the noise is within the controllable range, the decryption algorithm can easily restore the original text from the cipher text. But when we stack ciphertexts together for homomorphic calculations, the noise in each ciphertext will be superimposed and enlarged. If we just perform relatively simple logic on the ciphertext, then the superimposed noise is still within an acceptable range. But if we stack the ciphertexts together too complexly, once the range of noise exceeds the critical value, the original text will be completely overwritten, resulting in failure of decryption.
Fully Homomorphic Encryption (FHE)
After a lot of calls, we finally come to the fully homomorphic encryption (FHE) we are waiting for.
When we reach this stage, the aforementioned secure proxy computation becomes feasible. If we can find a highly efficient fully homomorphic encryption system, we can safely proxy all local computing to the cloud without leaking any data!
secondary title
Definition of Fully Homomorphic Encryption System
Before starting the discussion of the history of fully homomorphic systems below, we can systematically define the formal definition of fully homomorphic systems. A fully homomorphic encryption system has four algorithms in total:
secondary title
Properties of Fully Homomorphic Encryption Systems
Now let's look at the properties of this system (Properties). First, the system must be correct.
That is, if we arbitrarily choose a circuit F, and arbitrarily choose a set of original text messages m1,m2,.... If we have the key generated by the KeyGen algorithm at the beginning, then:
Second, the system needs to be semantically safe. We have already explained this definition above.
Finally, in order to make the fully homomorphic encryption system practical, we must add an additional requirement: compactness.
Why do we need to add this brief feature? Because without this requirement, we can easily make a meaningless (cheating) fully homomorphic system:
If there is no restriction on the size of the output of Eval, then after we repeatedly superimpose Eval for many times, the size of the ciphertext obtained will become larger and larger. When finally decrypting, you only need to decrypt all the original ciphertext, and then calculate it. This is like a user who encrypts his health information and asks the hospital to judge whether he is sick. The hospital sends back the ciphertext intact, and then sends back his entire set of data models plus medical textbooks. It means the same thing as telling users that you should do the math yourself.
This type of fully homomorphic system also has a bigger disadvantage, that is, the final ciphertext cannot completely cover up the specific details of the operation circuit F. In this hospital usage scenario, it is possible that the hospital's most valuable asset is this data analysis system. If you send your F back to the user in vain, then the fruits of your hard work will be easily stolen by others.
To sum up, as long as the three elements of correctness, semantic security, and brevity are met, we will have a non-trivial fully homomorphic encryption system.
secondary title
Historical Review of Fully Homomorphic Encryption
Before starting to learn how the fully homomorphic encryption algorithm is implemented, we might as well take a look at how the concept of fully homomorphic encryption came about.
The concept of FHE (full homomorphism) was actually proposed in the late 1970s. In 1978, in their paper On Data Banks and Privacy Homomorphisms, Rivest, Adleman and Dertouzos, several big names in the field of cryptography, mentioned for the first time the concept of a system that can perform certain calculations on ciphertext and indirectly operate on the original text. Later, this idea was resummarized and named fully homomorphic encryption.
It can be seen that the concept of fully homomorphic encryption has been proposed for a long time. Surprisingly, in 1976, two years before the paper was published, the Diffle-Hellman key exchange protocol had only just been proposed! It can be seen that the imagination of the cipher field is still very rich.
When the concept of FHE was proposed, the entire academic community was moved by it and began a decades-long search, trying to find a perfect algorithm with fully homomorphic properties. But over the past few decades, people have tried all conceivable options, but they can't find an option that can satisfy all the conditions of full homomorphism and whose security can be easily verified.
Until 2009, PhD Craig Gentry, who was studying at Stanford, suddenly had a flash of inspiration and broke through the difficulties of the FHE algorithm. In his doctoral dissertation, he gave a reasonable and secure fully homomorphic encryption system for the first time! This system is based on the assumption of an ideal lattice. The fully homomorphic system proposed by Gentry09 is often called the first generation fully homomorphic encryption system.
In Gentry's paper, he also mentioned a crucial concept called Bootstrapping. Bootstrapping is a special processing technique for ciphertext. After processing, it can actually "refresh" a ciphertext with noise close to the critical value into a new ciphertext with very low noise. Through Bootstrapping, the noise of a finite series system can never exceed the critical value, thus becoming a fully homomorphic system. In this way, we can homomorphically calculate any size.
After Gentry's major breakthrough, the entire cryptosphere fell into madness again, and everyone began to scramble to find a more efficient and versatile fully homomorphic system based on Gentry's ideas.
In 2011, two bigwigs Brakerski and Vaikuntanathan proposed a new fully homomorphic encryption system, which is based on Learning With Errors (LWE), another assumption of lattice encryption. In the same year, Brakerski, Gentry and Vaikuntanathan completed the system together and officially published it. The fully homomorphic system they invented is called the BGV system for short. The BGV system is a homomorphic encryption system with limited series, but it can be turned into a fully homomorphic system through Bootstrapping. The BGV system uses a more realistic LWE assumption than the system proposed by Gentry09. Generally speaking, we call the BGV system the second-generation fully homomorphic encryption system.
After 2013, the cryptosphere basically blossomed. Based on the original three-generation fully homomorphic system, various new designs have emerged to optimize and accelerate the operating efficiency of the BGV and GSW systems. IBM has developed an open source fully homomorphic computing library HElib based on the BGV system, and it has been successfully ported to major mobile platforms. At the same time, there is another open source project TFHE is also very noteworthy. TFHE is based on the GSW system, and various optimizations and accelerations have been added, and it is now very famous.
In addition to developing traditional fully homomorphic libraries, many teams are also researching how to better accelerate the calculation of fully homomorphic encryption algorithms through heterogeneous hardware such as GPUs, FPGAs, and ASICs. For example, cuFHE is a well-known CUDA-based GPU-accelerated fully homomorphic encryption system.
From today's point of view, it seems that 11 years have passed since Gentry knocked on the door of the fully homomorphic system. Now the industry is full of research on FHE, and many people are studying fully homomorphic systems from different angles and application requirements. Until today, we already have a variety of feasible FHE implementation methods, but what everyone is still pursuing is the efficiency of FHE system operation. Taking the current cutting-edge FHE library as an example, it may take tens of milliseconds or tens of seconds to homomorphically calculate some relatively simple logic on the mobile platform. These time units are extremely slow for computer systems.
How to make the FHE system run more efficiently on heterogeneous platforms is still an unsolved mystery. If this problem is solved, it will be a future within reach to turn all computer calculations into full homomorphism, and the agent performs calculations on the third-party cloud.
secondary title
Relationship between FHE and MPC
MPC is actually a very well-known field that has been studied for a long time. Since cryptographer Yao Zhizhi launched his Garbled Circuits in the last century, the field of MPC has been widely recognized and there have been many breakthroughs. Now we have a lot of MPC libraries that can be used, and they also have a certain operating efficiency.
If you know MPC, you may have many questions after seeing the difficult history of the fully homomorphic encryption system: Why can't you directly replace fully homomorphic encryption with an MPC protocol?
Indeed, an MPC protocol can completely replace an FHE protocol. We only need to use the user and private input as a party in a protocol, and then use the remote proxy computing server as another party to meet the conditions for the execution of the MPC protocol. Only through certain interactions can proxy computing be realized. And the server does not see the private input either.
But there is a very important point that we ignore: because MPC is interactive, it needs the user and the server to calculate and communicate together to complete the agreement. This also conflicts with the most fundamental requirement of FHE proxy computing. If the user needs to stay online to complete the agreement all the time, and also pay a part of the computing power, then the calculation is not "proxyed" at all, and the two parties are only doing more calculations for the security of information. This also explains why the field of MPC has made a major breakthrough, but the field of FHE is still unknown, because the two systems solve completely different problems.
secondary title