First understanding of "zero knowledge" and "proof"
安比(SECBIT)实验室
2019-09-04 03:42
本文约10223字,阅读全文需要约41分钟
Exploring zero-knowledge proof series (1)

Author: Guo Yu

introduction:https://github.com/

introduction:

I think blockchain is hardly a "technology". It's more like a field, all-encompassing. Or metaphysically speaking, the blockchain is more like an organism, integrating various theoretical technologies.

Zero-knowledge proof is an important technology for building trust and an indispensable part of the blockchain organism.

Zero-knowledge proof is a key technology to connect on-chain data and off-chain computing, and it is also an important way to realize on-chain data privacy protection

prove

"prove"past and present

What is proof? Many people may be like me. When they see these two words, they can’t help but think of various geometric figures similar to triangles in the middle school examination paper. When the teacher magically draws an auxiliary line, the proof process is suddenly obvious, and then regrets why did not expect.

Ancient Greek: "proof" == "insight"

Mathematical proofs originated in ancient Greece. They invented (discovered) axioms and logic, and they persuaded each other by proof, not authority. This is "decentralization" through and through. Since ancient Greece, this methodology has influenced the entire course of human civilization.

The picture above is a clever proof of the "Pythagorean Theorem". There have been many ingenious proofs, magical ideas, and genius inspirations in history. Once a proposition is proven, there is nothing God can do. Well, by the way, there is also the "God is not omnipotent" proof: God cannot create a stone that he cannot lift.

A mathematical proof often hides extremely profound "insights". I believe that many people have read the story of "Fermat's Last Theorem" [1]. I can't write it down," and it took generations of ingenuity to get Wiles to the top. Recently, such as the "Poincaré Conjecture", "Goldbach's Conjecture" with a bit of a sense of age, and the Chinese scientist Zhang Yitang, who I admire very much, has worked hard for ten years, after carefully studying "Goldston-Pintz-Yıldırım" and After the proof "insight" of "Bombieri-Friedlander-Iwaniec.", it proved "bounded intervals between prime numbers" [2].

Since Leibniz in the seventeenth century, people have dreamed of finding a mechanical means that can automatically complete proofs without relying on a flash of genius.

Early 20th Century: "Proof" == "Symbolic Reasoning"

At the end of the nineteenth century, Cantor, Boole, Frege, Hilbert, Russell, Browe, Gödel and others defined the symbol system of formal logic. And "proof" is the reasoning process written in the symbolic language of formal logic. Is the logic itself plausible? Is logic itself "self-consistent"? Is the logical reasoning itself correct? Can it be proved? This led mathematicians/logicists/computer scientists to invent (discover) symbol systems, syntax vs. semantics, reliable vs. complete, recursive vs. infinite. (For this part of the wonderful story, please refer to the book "Logic Engine" [3]).

In 1910, Russell published Hong (zhuan) Huang (tou) masterpiece "Principles of Mathematics". In the book, Russell and Whitehead tried to "formalize" mathematics completely. If such a goal can be achieved, all mathematical results will be built on a solid foundation in a proven way. The picture below is a page in "Principles of Mathematics (Volume 2)":


Among them, 110.643 is a proposition: "1+1=2", and then the proof of this theorem follows. You may be wondering, does 1+1 still need proof? Yes, in the book Principles of Mathematics, the numbers 0, 1, 2, ... are strictly defined, and "addition", "multiplication", and "equal to" must be strictly defined, and then every step of reasoning needs to be pointed out. What does proof mean? The proof may be extremely cumbersome, but every step of reasoning is rigorous and correct. A large number of proofs in the book are mechanical. According to axioms and inference rules, a kind of proof construction is carried out. Looking for proofs seems to be handed over to a person, and then he mechanically searches in the set of axioms and inference rules without thinking.

It seems that people are not far away from "automatic proof of theorem".

Unfortunately, Gödel proved the "Gödel Incompleteness Theorem" [4] in 1931, and Turing proved the undecidability of the Turing machine halting problem in 1936 [5]. These results put an end to this centuries-old fantasy. No matter how ingeniously designed an axiomatic system is, it cannot capture all truth.

Proof is not just a rigorous reasoning, but condenses creative thinking that seems difficult to mechanize. The proof contains a lot of "knowledge", and every breakthrough raises our cognition to a new level. Whether it is an "insight" or an "algorithm" constructed during the reasoning process, the connotation of a theorem's proof often goes far beyond the conclusion of the theorem itself.

1960s: "Proof" == "Procedure"

Half a century later, in the 1960s, logicians Haskell Curry and William Howard successively discovered many "magic correspondences" in "logic systems" and "computing systems - Lambda calculus", which was later called Named "Curry-Howard Correspondence". This discovery made everyone suddenly realize that "writing a program" and "writing a proof" are actually completely unified in concept. In the following 50 years, the development of relevant theories and technologies has made the proof no longer stay on the draft paper, but can be expressed by programs. This isomorphic mapping is very interesting: the type of program corresponds to the theorem of the proof; the cycle corresponds to the induction; ... (a book is recommended here: "Software Foundation" (Software Foundations Chinese translation) [6]). In the intuitionist framework, proving means constructing an algorithm, and constructing an algorithm is actually writing code. (The reverse is also true, um, what is coded by a coder is not a code, but a mathematical proof, :P)

At present, in the field of computer science, the proofs of many theories have changed from sketches on paper to the form of codes. The more popular "proof programming languages" include Coq, Isabelle, Agda and so on. Using programming to construct the proof, the correctness check of the proof can be completed mechanically by the program, and many repetitive labors can be assisted by the program. Like computer software, the edifice of proofs of mathematical theories is being built step by step. In December 1996, W. McCune used the automatic theorem proving tool EQP to prove a 63-year-old mathematical conjecture "Ronbins conjecture". The New York Times then published an article entitled "Computer Math Proof Shows Reasoning Power" [7], once again discussing the possibility of whether machines can replace human creative thinking.

The use of machine assistance can indeed effectively help mathematicians' thinking reach more unknown spaces, but "looking for proofs" is still the most challenging job. "Proof of verification" must be a simple, mechanical, and limited task. This is a natural "asymmetry".

1980s: "proof" == "interaction"

In 1985, Jobs had just left Apple, and Dr. S. Goldwasser came to MIT after graduation, and co-wrote a classic with S. Micali and Rackoff that can be recorded in the annals of computer science: "The knowledge in the interactive proof system is complex. sex” [8].

They reinterpreted the word "proof" and proposed the concept of an interactive proof system: by constructing two Turing machines to "interact" rather than "reason", to prove whether a proposition is true in probability. The concept of "proof" has been expanded again.

The form of interactive proof is two (or more Turing machines) "dialogue scripts", or Transcript. And in this dialogue process, there is an explicit "certifier" role and an explicit "verifier". Among them, the prover proves to the verifier that a proposition is established, and at the same time "does not disclose any other knowledge". This is called "zero-knowledge proof".

secondary title

interactive proof

Alice: I want to prove to you that I have a solution to the equation, w^3 - (w+1)^2 + 7 = 0 (solution of the equation: w=3)

Bob: ok, I'm listening

Alice: But I won't tell you how much x is unless you're willing to pay for it.

Bob: Yes, but you have to prove that you have the solution of the equation first, and then I will pay you.

Alice: @#$%^& (black technology)

Bob: ?????? (black technology)

Alice: &*#$@! (black technology)

Bob: ?????? (black technology)

...... (Continue to black technology)

Alice: Okay, it's over

Bob: Well, you do have the solution to the equation, but will you tell me the answer if I pay you?

Alice: Stop talking nonsense and pay!

The above example is an "interactive proof". Suppose Alice knows the solution of the equation, f(w) = 0, how can Alice convince Bob that she knows w? Alice told Bob a lot of information in the "black technology stage". Well, the key question is, can Bob guess what w is from a lot of information that Alice said, or can he analyze the clues about w? If Bob has this ability, Bob may not need to pay, because he has already obtained this valuable information.

Please note that if the dialogue between Alice and Bob is "zero-knowledge", then Bob cannot obtain any other information about w except that w is the solution of f(w)=0. This is very important, it's in Alice's interest.

Now review the term "Zero-Knowledge Proof", which is called "Zero-Knowledge Proof" in English. This word has three key parts:

  • zero

  • prove

  • prove

You may already have a little feeling, let's try to interpret it:

  • Zero: Alice has leaked "zero" knowledge about w, that is, no knowledge has been leaked.

  • Knowledge: Here it refers to w.

  • Proof: It is the "black technology part" in the conversation between Alice and Bob.

secondary title

What are zero-knowledge proofs good for?

When mentioning zero-knowledge proof technology, many people think of anonymous coins, such as Monero, such as ZCash. Indeed, these coins have popularized zero-knowledge proofs very well. I myself heard the term zero-knowledge proofs for the first time through ZCash. But after a deeper understanding of this technology, I deeply feel that the power of this technology is far more than this.

Zero-knowledge proof technology can solve the trust problem of data and computing!

Zhang San said that he had 100 yuan, Li Si said that he graduated from Peking University, and Wang Wu said that he would have lunch with Ba Feite. Empty talk without evidence, Show me the proof.

So how can "zero-knowledge proof" solve data trust? In the previous article "zkPoD: Blockchain, Zero-Knowledge Proof and Formal Verification, Realizing Fair Transactions Without Intermediaries and Zero Trust" [9], I mentioned a concept "simulation":

Zero-knowledge proof technology can "simulate" a third party to ensure that a certain assertion is credible

In other words, when we receive an encrypted data, then there is a zero-knowledge proof. This zero-knowledge proof says "the X assertion about the data is true", then this is equivalent to an angel whispering in our ears, "the X assertion about the data is true"!

For this X assertion, it can be very flexible, it can be an NP complexity algorithm. In the vernacular, as long as we can write a program (a polynomial time algorithm) to judge whether a data satisfies the X assertion, then this assertion can be expressed in a zero-knowledge proof. In layman's terms, as long as the data judgment is objective, then zero-knowledge proof is applicable.

Some uses of zero-knowledge proofs:

  • Data privacy protection: In a data form, there are more or less some information that I don’t want to be exposed. For example, my report card that year, I just want to prove to others that my grades passed, but I don’t want others to know what I did. It will be embarrassing to get 61 points or 62 points. I don't have heart disease, but the insurance company needs to know this, but I don't want the insurance company to know my private information. Then I can prove to the insurance company that I don't have a heart attack, but all the medical records don't need to be exposed. I am a business, I want to get a loan from the bank, I just want to prove to the bank that I have a healthy business and repayment ability, but I don't want the bank to know some of our business secrets.

  • Computing compression and blockchain expansion: Among many blockchain expansion technologies, Vitalik's use of zkSNARK technology can bring dozens of times of performance improvement to the existing Ethereum framework. Because of the proof of calculation, there is no need to repeat the same calculation many times. In the traditional blockchain architecture, the same calculation is repeated many times, such as signature verification, transaction legality verification, and smart contract verification. execute etc. These calculation processes can be compressed by zero-knowledge proof technology.

  • End-to-end communication encryption: users can send messages to each other, but there is no need to worry about the server getting all the message records. At the same time, the messages can also produce corresponding zero-knowledge proofs according to the requirements of the server, such as the source of the message and the purpose of sending it land.

  • Identity authentication: The user can prove to the website that he has a private key, or knows a secret answer that only the user knows, but the website does not need to know, but the website can confirm the identity of the user by verifying this zero-knowledge proof

  • Decentralized storage: The server can prove to the user that their data is properly kept and does not reveal any content of the data.

  • Credit record: Credit record is another field that can give full play to the advantages of zero-knowledge proof. Users can selectively show their own credit records to the other party. Authenticity of records.

  • Construct a completely fair transaction protocol for online digital goods [9].

  • secondary title

Example: Map triple coloring problem

Let's talk about a classic problem, the three-coloring problem of the map. How to color a map with three colors, so that any two adjacent regions are of different colors. We transform this "map three-coloring problem" into a "connected graph vertex three-coloring problem". Suppose each region has a capital (node), and then connect adjacent nodes, so that the map coloring problem can be turned into a connected graph vertex coloring problem.

Let's design an interaction protocol:

  • "Certifier" Alice

  • "Verifier" Bob

Alice has an answer to the three colorings of the map in her hand, see the picture below. This graph has a total of 6 vertices and 9 edges.

Now Alice wants to prove to Bob that she has the answer, but she doesn't want Bob to know the answer. What is Alice going to do?

Alice first needs to perform some "transformation" on the dyed picture, and make a big shift in color, such as turning all green into orange, turning all blue into green, and turning all green into orange. Then Alice got a new coloring answer. At this time, she covered each vertex of the new graph with a piece of paper, and then showed it to Bob.

Look at the picture below. At this time, Bob is about to make a move (see the picture below). He wants to choose a "side" randomly. Note that it is random, and Alice will not let the random number predicted in advance.

Suppose Bob picks the bottom edge and tells Alice.

At this time, Alice uncovers the pieces of paper at both ends of this edge and asks Bob to check. Bob finds that the colors of the two vertices are different, so Bob thinks that this test is isomorphic. At this time, Bob only sees a part of the graph, can he be convinced that the coloring of the vertices of the rest of the graph is fine? You must feel that this is not enough, maybe Alice just got it right? Other vertices that are not exposed may be randomly colored.

It doesn't matter, Bob can ask Alice to do it again, see the picture below

Alice changes the color again, changing blue to purple, green to brown, orange to gray, and then covers all the vertices with paper. Then Bob picks another edge, for example, as in the picture above, he chooses a vertical edge, and then asks Alice to uncover the paper to have a look. If Bob finds that the vertices at both ends of this edge have different colors, then Bob will Time has faltered a bit, maybe Alice really has the answer to this coloring. However, two times are still not enough, and Bob wants to do it a few more times.

Then after repeating these three steps many times, the probability that Alice can cheat and successfully fool Bob will decrease exponentially. Suppose after n rounds, the probability of Alice cheating is

Here |E| is the number of all edges in the graph. If n is large enough, this probability Pr will become very, very small and become "insignificant".

secondary title

Information vs. Knowledge

  • Information「Information」

  • Knowledge "Knowledge"

In the interactive proof of the map three-coloring problem, after repeated interactions many times, Bob gets a lot of information, but this is like Alice sending Bob a bunch of random numbers, Bob does not "know" more things. For example, if Alice tells Bob "1+1=2", Bob gets this information, but Bob does not get more "knowledge" because everyone knows this fact.

If Alice tells Bob that the number 2^2^41-1 is a prime number, obviously this is "knowledge", because it takes a lot of computing power to figure out whether the number is a prime number.

If Alice tells Bob that there are a total of two vertices with green color, then Bob has gained valuable "knowledge", because based on the information he just obtained, Bob can use a Turing machine to solve the three vertices in a shorter time. Staining problem. If Alice reveals to Bob that the color of the leftmost vertex is orange, then obviously, this "information" does not substantially help Bob to solve the problem.

We can try to define that if the "information" obtained by Bob during the interaction process can help improve Bob's ability to directly crack Alice's secret, then we say that Bob "acquired knowledge". It can be seen that the definition of the word knowledge is related to Bob's computing power. If the information does not increase Bob's computing power, then the information cannot be called "knowledge". For example, during the interaction between Alice and Bob, Alice flips a coin every time, and then tells Bob the result. From the perspective of information, the information Bob gets is just an "event", but Bob does not get any "knowledge" because Bob completely You can flip a coin yourself.

The following quotes the summary in the book "Foundations of Cryptography - Basic Tools" [10]

  • "Knowledge" is related to "computational difficulty", but "information" is not

  • "Knowledge" is concerned with what is publicly known, while "Information" is primarily concerned with what is partially public

secondary title

Verifiable computation and circuit satisfiability problems

Looking at the three-colored problem on the map above, do you feel that this is just an academic problem, how to relate it to real problems? The map three coloring problem is an NP-Complete problem, which is a term in "computation theory". Another one called "Circuit Satisfiable Problem" is also an NP-Complete problem. NP-Complete is a type of problem. Its solution process is difficult to complete in polynomial time, that is, "difficult to solve", but the process of verifying the solution can be completed in polynomial time, that is, "simple to verify".

So what is a circuit? Here are three different "arithmetic circuits":

It can be seen that a circuit consists of many gates, including addition gates and multiplication gates. Each gate has several input pins and several output pins. Each gate performs an addition operation or a multiplication operation. Don't look so simple, the code we usually run (without infinite loop) can be represented by arithmetic circuits.

What does this mean? Let's try to solve the data privacy protection problem by combining "zero-knowledge proof" and "circuit satisfiability problem".

Next, please think about a scenario: Bob gives Alice a piece of code P and an input x, asks Alice to run it once, and then tells Bob the result. Maybe this calculation needs to consume resources, and Bob outsources the calculation process to Alice. Then Alice ran it again and got the result y. Then tell Bob y. Here comes the question:

How to make Bob believe that the result of running code P must be y without running the code?

Here is thinking time, everyone can think for five minutes...

(five minutes later……)

One of Alice's methods is to take a picture of the entire calculation process with a mobile phone. This video includes the computer CPU, memory, and the state of each transistor during the entire operation process. It is obviously unrealistic to do so. So is there a more feasible solution?

The answer is that Bob converts the program P into a completely equivalent arithmetic circuit, and then gives the circuit to Alice. Alice only needs to calculate the circuit, and then the process can be photographed with a mobile phone, or written down with paper, if the circuit scale is not so large. Alice only needs to input parameter 6 into the circuit, and then record the values ​​of all the pins connected to the gate during the operation of the circuit. And the value of the final circuit output pin is equal to y, then Bob can be sure that Alice did perform the calculation. Alice needs to write the input and output of all the gates of the circuit on a piece of paper and hand it to Bob. This piece of paper is a proof of calculation.

In this way, Bob can verify whether the proof on this piece of paper is correct without recalculating the circuit. The verification process is very simple:

Bob checks in turn whether the input and output of each gate can satisfy an addition equation or a multiplication equation.

For example, gate No. 1 is an addition gate, its two inputs are 3, 4, and its output is 7, so it is easy to know that the calculation of this gate is correct. When Bob has checked all the doors, he can be sure that:

Alice actually did the calculation, she didn't cheat.

The content on this paper is a solution "Solution" that "satisfies" the arithmetic circuit P.

The so-called circuit satisfiability means that there is a solution that satisfies the circuit. If the output value of this solution is equal to a certain value, then this solution can "represent" the calculation process of the circuit.

For Alice, if Bob verifies in this way, she has absolutely no room for cheating. But this method obviously has a drawback:

  • Disadvantage 1: If the circuit is relatively large, the proof will be very large, and the workload for Bob to check the proof is also very large.

  • Disadvantage 2: During the verification process, Bob knows all the details of the circuit operation, including the input.

  • black technology

Let's make some changes to the Alice and Bob scene just now. Suppose, Alice herself has a secret, such as her online banking password. And Bob wants to know whether the length of Alice's online banking password is 20 characters long. But Alice thought for a while and told him that the length of the password should not be a big problem. At this time, Bob converts a code for calculating the length of a string into a circuit Q, and sends it to Alice. Alice calculated her own password with the circuit Q, and then sent Bob the pins of all the gates of the circuit, and brought the calculation result 20.

Wai...t, this is problematic, after Bob gets all the internal details of the circuit operation, doesn't he know the password? Yes, Alice obviously can't do that. So what should Alice do?

The answer is that there are many ways. Readers who love blockchain technology are most familiar with zkSNARK[11], zkSTARK[12], BulletProof[13], and some relatively small technologies, all of which can help Alice do it. arrive:

In a zero-knowledge fashion, Alice proves to Bob that she calculated the circuit and used her secret input.

In other words, these "zero-knowledge circuit satisfiability proof protocols" provide Alice with a powerful weapon to prove to Bob that her online banking password is of length 20, and besides that, Bob can no longer get any other useful Information. In addition to the online banking password, Alice can theoretically prove to Bob some characteristics of any of her private data, but does not reveal any other information.

"Zero-knowledge circuit satisfiability proof protocol" provides the most direct technology to protect privacy/sensitive data

write at the end

write at the end

Whether it is a subtle number theory theorem, a three-colored map problem, or a circuit satisfiability problem. What is the point of proving existence? All proofs embody the "asymmetry" between "proof" and "verification". Proof may be a very computationally or mentally consuming activity, whether it is the "Fermat's Last Theorem" that took hundreds of years, or the POW proof in Bitcoin, these proofs condense the time spent in the process of finding proofs. Energy, the proof process can be unrealistically complex, occasionally requiring a genius to turn out. And the verification process must (or should) be a very simple, mechanical, and terminateable activity in (polynomial) valid time. In a sense, this asymmetry really embodies the significance of the proof and demonstrates the value of zero-knowledge proof.

Roughly speaking, "proof" is a product of "logic", but "logic" and "calculation" are inextricably linked. You may vaguely feel some connection between "proof" and "calculation". They run throughout: e.g. mechanical reasoning, proof representation, interactive computation. This is an interesting but larger philosophical question.

references

references

  • [1] Simon, Singer, Xue Mi. Fermat's Last Theorem: A Mystery That Puzzled the World's Wise Men for 358 Years [M]. Shanghai Translation Publishing House, 1998.

  • [2]  Alec Wilkinson. The Pursuit of Beauty: Yitang Zhang solves a pure-math mystery. The New Yorker. Feb. 2015.

  • [3] Martin, Davis, Zhang Butian. The Engine of Logic [M]. Hunan Science and Technology Press, 2012.

  • [4] Raymond Smullyan. Gödel's Incompleteness Theorems, Oxford Univ.Press. 1991.

  • [5] Turing, Alan. "On computable numbers, with an application to the Entscheidungsproblem." Proceedings of the London mathematical society 2.1 (1937): 230-265.

  • [6] Pierce, Benjamin C., et al. "Software foundations."Chinese translation:<https://github.com/Coq-zh/SF-zh

  • [7] Kolata, Gina. "Computer math proof shows reasoning power." Math Horizons 4.3 (1997): 22-25.

  • [8] Goldwasser, Shafi, Silvio Micali, and Charles Rackoff. "The knowledge complexity of interactive proof systems." SIAM Journal on computing 18.1 (1989): 186-208.

  • [9] zkPoD: Blockchain, zero-knowledge proof and formal verification, realizing fair transactions without intermediaries and zero trust. Ambi Lab. 2019.

  • [10] Oded, Goldreich. "Foundations of cryptography basic tools." (2001).

  • [11] Gennaro, Rosario, et al. "Quadratic span programs and succinct NIZKs without PCPs." AnnualInternational Conference on the Theory and Applications of Cryptographic Techniques. Springer Berlin, Heidelberg, 2013.

  • [12] Ben-Sasson, Eli, et al. "Scalable, transparent, and post-quantum secure computational integrity." IACR Cryptology ePrint Archive 2018 (2018): 46.

  • [13] Bünz, Benedikt, et al. "Bulletproofs: Short proofs for confidential transactions and more." 2018IEEE Symposium on Security and Privacy (SP). IEEE, 2018.

安比(SECBIT)实验室
作者文库