SDBD2020 was successfully held, and computing power think tank joined hands with global industry leaders to interpret intelligent data empowerment
算力智库
2020-08-27 03:02
本文约2903字,阅读全文需要约12分钟
On August 25th, the 2nd International Symposium on Smart Data and Blockchain Application of SIGKDD·SDBD2020 was held online, discussing the way of data governance and sharing under the background of data economy.

ACM SIGKDD2020 (Knowledge Discovery and Data Mining Conference) is the world's top, highest level, and most influential data science event every year. This SDBD international seminar focuses on emerging technologies such as smart data and blockchain in the KDD conference The special Workshop is co-organized by Singapore Management University and Hashpower Think Tank.

Yan Li, founder of Hashpower Think Tank, said: "This year, the country raised data to the level of production factors, marking the arrival of the industry. Hashpower Think Tank has been tracking smart data, private computing, and good companies in the blockchain track. Application, hoping to promote the development of industry and ecology through first-hand original in-depth reports and research reports.”

secondary title

Data governance is the premise of digital economy

Technologies such as artificial intelligence and blockchain make data smart data, but the so-called data economy goes beyond smart data. Zhu Feida, a professor at the Singapore Management University, believes that before talking about how technology can make data "computable", one must also recognize a premise - data is an asset, and data needs to be governed.

However, even such a major premise took a long time for humans to realize. Professor Zhu Feida said that the data economy has gone through three stages. In the first stage, data is only a by-product of commercial activities, and people use data more to understand the past. The second stage is the rise of big data and data economy. At this stage, data is monopolized by a small number of companies to benefit. The third stage is the data-driven economy, where data becomes a shared asset that drives all organizations.

Luo Zhen, CEO of BaseBit.ai, believes that from the information age to the smart age, data is being used more by machines and computing programs, making machines more intelligent. As a relic of the information age, data has unique economic characteristics, such as virtual reusability, high fixed costs, and low variable costs, and it also shows in non-economic dimensions such as privacy, compliance, confidentiality, and security. Distinctive features.

In fact, people have paid a price in the process of understanding these data characteristics, and data breach cases without governance have caused huge losses.

He Bingsheng, an associate professor at the National University of Singapore, said that data leakage is no longer an isolated incident, and there are related incidents in various industries. In fields such as education, medicine, energy, and health, the average loss caused by each data breach is at least $5 million.

These negative cases not only carry huge economic losses, but also consume the willingness and confidence of the entire society for data sharing, making data sharing that is already weak in foundation even more difficult.

Luo Zhen pointed out that data cannot be shared safely by nature, but data value sharing should be realized without sharing the data itself.

Of course, this is even more difficult. The most famous obstacle is the data island. Professor He Bingsheng took hospital data as an example and said that different hospitals have different patients, but in fact the cases have many similarities. Hospitals, banks, and e-commerce companies, their respective data actually grasp different aspects of the same population.

Professor Zhu Feida's research found that there are two bottlenecks in the data ecosystem. The first major bottleneck exists among different roles such as individuals, enterprises, and governments. The second largest bottleneck exists between data, models and applications.

Between individuals and enterprises, users, as data contributors, are not only excluded from the value distribution, but also have very limited awareness and control of their own data, and are subject to the risk of privacy leakage. There are privacy and data leakage concerns between individuals and governments. There are "data high walls" among government departments due to institutional reasons, data islands are common among enterprises, and there is a lack of incentives for data sharing between enterprises and the government.

In addition, on the data side, the source of data is unknown and the quality is poor; on the model side, it is difficult to obtain real user data for model design and training; on the application side, the lack of advanced models results in low-level data intelligence.

secondary title

Data Sharing: Technology and System Walk on Two Legs

The general consensus reached by experts from all walks of life is that the safe flow and sharing of data needs to go hand in hand at the two levels of technology and system.

At the technological level, it can be seen that all kinds of cutting-edge technologies are blooming, and privacy and security computing is the overall solution to enable data security sharing.

Luo Zhen divides the current popular privacy and security computing technologies into three major paths: multi-party secure computing MPC/homomorphic encryption, federated learning, and secure sandbox computing/TEE. He further summed up the difference between these three paths - as the trust assumption increases in turn, the computational complexity decreases. In addition, privacy and security computing technology also involves technologies such as differential privacy, cloud computing, blockchain, and anti-neural network.

Yang Qiang, chief artificial intelligence officer of WeBank and chair professor of the Department of Computer and Engineering at the Hong Kong University of Science and Technology, explained federated learning by taking sheep grazing as an example. The traditional method is to collect grass from various places to feed sheep, but this is not compliant, and the requirements for privacy and data security protection make obtaining data an obstacle. And federated learning provides a new idea: let the flock move around, but the grass does not come out of the local area, and the owner cannot know what grass it has eaten.

Li Xiaolin, partner of Tongdun Technology and president of the Artificial Intelligence Research Institute, introduced the knowledge federation of Tongdun Technology, which uses a hierarchical framework system to support various secure multi-party applications, and effectively utilizes multiple participants through data security exchange protocols. Data, knowledge co-creation, sharing and reasoning, making data available and invisible.

The participating experts also focused on the value of blockchain technology in data security sharing.

Zhang Shuai, vice president of Qulian Technology, said that as a guarantee technology for data rights confirmation and value transfer, the greatest value of blockchain lies in the records of data registration and exchange processes to ensure the safe sharing of data. Access rights, access logic, and the price to be paid can be registered on the blockchain, while smart contracts realize a fully automated process.

Regarding the integration with other technologies, Zhang Shuai believes that although the blockchain itself does not solve data sharing, it can become the glue of multiple technologies. Other vertical technologies such as the Internet of Things are related to data, and the blockchain can bond them horizontally.

Du Yu, deputy general manager of Shanghai Wanxiang Blockchain Co., Ltd. and head of Wanxiang Blockchain Laboratory, also expressed similar views. He believes that in reality, companies and financial institutions will not expose transaction records and business secrets, but every company has a lot of data, and there are data islands. Faced with many collaborative requirements, the blockchain can assist in horizontal data sharing and connect isolated islands.

Wu Ming, co-founder of Treemap blockchain Conflux, believes that blockchain itself, as a distributed ledger, is a reliable data carrier that can carry data of the greatest value, such as financial credit data. Therefore, blockchain and data are inseparable. The data carried can also be certified.

Tong Lin, CEO of Phala Network, pointed out that the type and quantity of data on the current chain are much smaller than that of off-chain data. The characteristic of the data on the chain is that it is available when it is visible, and it is not available if it is invisible (encrypted data), but the use of privacy and security computing can make the data on the chain available and invisible.

At present, various technologies have achieved initial landing.

Taking WeBank as an example, it joined hands with Tencent Tianyan Lab to establish the Tencent Medical Health-WeBank Joint Laboratory. The "stroke risk prediction model" developed based on federated learning has an accuracy rate of over 80%, and the prediction index of the small hospital model Increased by 10-20%.

Yifang Jianshu built a big data application and open platform based on privacy and security computing technology for Xiamen, the first batch of pilot cities for China's health and medical big data, and created a model for medical big data governance and hierarchical diagnosis and treatment.

At the system level, experts from all walks of life unanimously emphasized the importance of laws, regulations and standards.

Wang Shuang, founder of Nuowei Technology, said that data sharing needs the promotion of laws and standards in addition to technology. The formulation of the data to jointly promote the circulation of data production factors.

Tan Chang, executive director of iFLYTEK Big Data Research Institute of HKUST, said frankly that large-scale data sharing between enterprises is actually rare in reality, partly because laws and regulations still need to be improved, issues such as pricing of data production factors have not yet been resolved, and there is a lack of data market The transaction mechanism, that is, the relevant infrastructure needs to be improved urgently.

In the roundtable discussion on "Release the value of data, what are the wonderful uses of blockchain", Ingo Rübe, the founder of KILT Protocol, thought about the data security of blockchain from a legal point of view-what is safe? The blockchain was not considered when the GDPR was enacted in 2012, but data encryption on the blockchain is ultimately a mathematical problem that can be solved after all, so the legislative issue should still be focused first.

算力智库
作者文库