What "algorithm details" Beijing asked for from Chinese tech giants
Not as sexy as the headlines sound
This has been widely reported this week
BBC: Chinese internet giants hand algorithm data to government
Chinese internet giants including Alibaba, Tiktok-owner ByteDance and Tencent have shared details of their algorithms with China's regulators for the first time.
Algorithms decide what users see and the order they see it in - and are critical to driving the growth of social media platforms.
They are closely guarded by companies.
In the US Meta and Alphabet have successfully argued they are trade secrets amid calls for more disclosure.
The Cyberspace Administration of China (CAC) has published a list with the descriptions of 30 algorithms.
Chinese technology giants shared details of their prized algorithms with the country’s regulators in an unprecedented move, as Beijing looks for more oversight over its domestic internet sector.
The Cyberspace Administration of China, one of the country’s most powerful regulators, released a list on Friday of 30 algorithms alongside a brief description of their purpose from companies including e-commerce firm Alibaba and gaming giant Tencent.
It comes after China brought in a law in March governing the way tech firms use recommendation algorithms. The rules include allowing users to opt out of recommendation algorithms, as well as requiring companies to obtain a license to provide news services.
Algorithms are the secret sauce behind the success of many of China’s technology companies. They can be used to target users with products or videos based on information about that customer.
Bloomberg: Alibaba, ByteDance Share Details of Prized Algorithms With Beijing for First Time
China’s internet giants from Tencent Holdings Ltd. to ByteDance Ltd. have shared details of their prized algorithms with Beijing for the first time, an unprecedented move aimed at curbing data abuse that may end up compromising closely guarded corporate secrets.
The internet watchdog on Friday published a list describing 30 algorithms that firms including Alibaba Group Holding Ltd. and Meituan employ to gather data on users, tailor personal recommendations and serve up content. While the public list stopped short of revealing the actual code, it wasn’t clear the extent to which internet firms may have revealed their underlying software to regulators in private.
So, what "algorithm details" has Beijing asked for?
As correctly reported by Reuters in China regulator says Alibaba, Tencent have submitted app algorithm details:
China in March passed new regulations for algorithm recommendation services and launched a filing system requiring firms to companies to disclose they used in their apps.
The exact regulation that creates the filing system is 《互联网信息服务算法推荐管理规定》Internet Information Service Algorithmic Recommendation Management Provisions (thanks to Stanford University’s DigiChina, an English translation is available.)
And it’s this filing system https://beian.cac.gov.cn/#/index
The article in the red box is 互联网信息服务算法备案系统使用手册 the user manual for the filing system, published in February, in a downloadable PDF https://beian.cac.gov.cn/api/file/fileDownLoad?noticeId=notice_fad4fc04-fd2e-4db2-9a85-1a2fd669554a shedding light on what the Cyberspace Administration of China asks for.
The user manual says in the beginning
依据《互联网信息服务算法推荐管理规定》(以下简称《管理规定》)的相关要求,备案主体通过备案系统履行备案手续
Based on the relevant requirements in the Internet Information Service Algorithmic Recommendation Management Provisions, the subject obligated with filing (their algorithms) implements the filing procedures through the filing system.
That means this system is the only channel to meet the filing requirement.
According to its user manual, after logging into the system, the first part is about the company’s information: its name, registration number, type, location, address, web link, monthly active user number, corporate license, and where that license is registered.
After that, it’s about the internet product: the name of the product, the service this product offers, the way to access this service, the status of this product (active nor not), the type of customers/clients/users the product serves, whether using the product requires real-name registration, and if the product had obtained prior government approval.
Then it’s about how to navigate the functions of the Intenet product
And the functions of the Internet product - its name and its description limited to fewer than 500 characters in Chinese Mandarin.
As the manual suggests, from here on it’s the information about the algorithm.
基本信息 Basic features: the type, the name, the time it got online, the version number, the field of the application, a self-evaluation report of the security of the algorithm (up to 20MB in file size), and the company’s planned disclosure to the public (also up to 20MB).
Not sexy enough so far? Now onto supposedly the most saucy part.
To make it as clear as possible, I’ve put translations IN the picture (Allow your email system to load pictures!):
Basically, the 详细属性 features in detail include four parts, in addition to
a description of the algorithm up to 200 characters and the application scenario, which you can’t type in but can only scroll down to choose (the manual didn’t show the options). And then
algorithm data [UPDATE AFTER PUBLICATION: I’ve been advised by a friend that algorithm settings would be more accurate than algorithm data]: the model and status of data (options to choose, can’t type in), yes or no to the two questions - if the data input includes biometrics and ID info;
algorithm model: the source of data for training the algorithm, which is optional, meaning the company can leave the whole part blank, including the name of the open-sourced dataset and its brief description, as well as the name of the proprietary dataset and its brief description;
algorithm tactic【Not shown in the manual】;
algorithm risks and prevention mechanism【Not shown in the manual】;
Because the user manual didn’t show snapshots for the 算法策略 algorithm tactic and algorithm risks and prevention mechanism in the filing system, what the tech company is asked to provide is unavailable.
Is it possible to gauge what the Cyberspace Administration of China means by algorithm tactic?
The 《互联网信息服务算法推荐管理规定》Internet Information Service Algorithmic Recommendation Management Provisions which created this filing system talked about the tactic
Article 12: Algorithmic recommendation service providers are encouraged to comprehensively use tactics such as content de-weighting, scattering interventions, etc., and optimize the transparency and understandability of search, ranking, selection, push notification, display, and other such norms, to avoid creating harmful influence on users, and prevent or reduce controversies or disputes.
How about the algorithm risks and prevention mechanism?
In issuing the 《互联网信息服务算法推荐管理规定》Internet Information Service Algorithmic Recommendation Management Provisions, the Cyberspace Administration of China published an official Q&A:
算法应用日益普及深化,在给经济社会发展等方面注入新动能的同时,算法歧视、“大数据杀熟”、诱导沉迷等算法不合理应用导致的问题也深刻影响着正常的传播秩序、市场秩序和社会秩序,给维护意识形态安全、社会公平公正和网民合法权益带来挑战,迫切需要对算法推荐服务建章立制、加强规范,着力提升防范化解算法推荐安全风险的能力,促进算法相关行业健康有序发展。
The application of the algorithm is becoming more and more popular and deepening. While injecting new momentum into economic and social development, the problems caused by the unreasonable application of algorithms such as algorithm-based discrimination, big data-enabled price discrimination against existing customers, and seduced addiction have also profoundly affected the proper order in communications, market order, and social order, posing challenges to the maintenance of ideological security, social fairness and justice, and the legitimate rights and interests of netizens. It is urgent to establish rules and regulations for algorithm recommendation services, strengthen standards, strive to improve the ability to prevent and resolve the security risks of algorithm recommendation, and promote the healthy and orderly development of the industries related to algorithm
Not sure if you find it interesting, alarming, or boring, but that is the sort of “algorithm details” Chinese tech companies have been asked for.
Finally, take a look at the public disclosure of the personalized recommendation algorithm of Douyin, the Chinese elder sister of TikTok. I’m sure the two short-video platforms vary a lot by now, but the basic logic underpinning them may be similar:
The Douyin personalized recommendation algorithm is based on user equipment information, location information, and behavior information when using the product (behavior information includes a user's click, follow, favorite/save, search, query, browse, download, share and transact when the user is accessing or using the product).
Through the automatic analysis and calculation of the above information, the content that the user may be more interested in is screened out from the information candidate pool and pushed (to the user). The Douyin personalized recommendation algorithm will provide real-time feedback to the recommendation model according to the browsing behavior of users in their process of using the product, and continuously adjust and optimize the recommendation results to better provide users with high-quality content.
The Douyin personalized recommendation algorithm is mainly based on the user's historical click, duration, likes, comments, sharing, forwarding, dislikes, and other behavioral data. It builds a model through the in-depth learning technology framework to estimate the probability of user interaction with certain content. It uses sorting, scattering, intervention, and other mechanisms and tactics for the estimated content before recommending content to the user.
User behavior, in the three dimensions of "user, content, and interaction", is sampled to enter the machine learning model for training, and the training results are used to update the user portrait and recommend new content.
In order to avoid the "information cocoon" problem, the Douyin personalized recommendation algorithm designed an "interest exploration" mechanism. On the one hand, each recommendation will involve, in a certain portion, content categories that the user did not often watch in the past. On the other hand, a piece of random content is specially added in the process of obtaining the recommended content each time, to ensure the diversity of content visible to the user.