Zhiyuan Index CUGE released, AI large model has a new benchmark for evaluation

Artificial intelligence large models are in the ascendant, and evaluation benchmarks have become the vane for the development of large models. At the Beijing Zhiyuan Artificial Intelligence Research Institute (hereinafter referred to as “Zhiyuan Research Institute”) recently held the Natural Language Processing (NLP) major research direction frontier technology open day event, a new benchmark for the evaluation of Chinese language understanding and generation-Zhi Source index is released.

In recent years, the English evaluation benchmark GLUE and other evaluation benchmarks have become important standards for measuring the intelligent progress of large model languages, and have received extensive attention from the academic community and the industry. However, GLUE only evaluates language comprehension capabilities, and ignores important language capabilities such as language generation, multilingualism, and mathematical reasoning; it only provides data set scores and overall scores, and overall scores are easily dominated by a few data sets.

From flat to comprehensive system, from simplification to multiple dimensions, CUGE aims to try to design a new test paper for comprehensive evaluation of comprehensive ability for large model evaluation.

In the benchmark framework, the wisdom source index is different from the traditional flat organization of commonly used data sets. According to the human language test syllabus and the current status of NLP research, the data set is selected and organized in a hierarchical framework of language ability-task-data set. Covers 7 important language abilities, 17 mainstream NLP tasks and 19 representative data sets, which are fully balanced and avoid “partial subject selection”.

In terms of scoring strategy, the wisdom source index can better show the model language intelligence differences in different dimensions of the model. Relying on the hierarchical benchmark framework, it provides different levels of model performance scores, including data sets, tasks and language skills, etc., and the system is greatly strengthened .

In order to promote the co-construction and sharing of the Zhiyuan Index and improve the ease of use of the Zhiyuan Index, this event also released an online evaluation platform and a public ranking list, supporting multiple Display modes, including comprehensive lists, condensed lists and single data sets. List, which is convenient for users to quickly understand the characteristics and latest developments of models and data sets from multiple angles.

Release is only the starting point, and development requires ecological co-construction-Liu Zhiyuan, associate professor of Tsinghua University, Zhiyuan young scientist, and backbone member of the Zhiyuan Index construction, said: “Based on the single data set list ability, the future Zhiyuan Index will regularly absorb the latest outstanding Data sets. At the same time, we will rely on the power of Zhiyuan Research Institute and Zhiyuan Community to establish user feedback and discussion mechanisms for data sets and evaluation results, build a Chinese high-quality data set community, and promote the development of Chinese natural language processing .”

With the support of Zhiyuan Research Institute, a team of scholars in the major research direction of natural language processing is actively exploring the new pattern of natural language processing, driving through the two-wheel drive of big data and rich knowledge, and interacting with cross-modal information to significantly improve the use of natural language The core Chinese semantic comprehension and generation ability.

In terms of landing applications, the “Multimodal Beijing Tourism Knowledge Map” constructed by the team of Professor Li Juanzi of Tsinghua University can provide data support for functions such as route planning and scenic spot information query, and plan travel itineraries for tourists.

It is reported that the Zhiyuan Index is supported by the Beijing Zhiyuan Artificial Intelligence Research Institute. The working committee units are Tsinghua University, Peking University, Renmin University, Chinese Academy of Sciences, Beijing Language and Culture University, Fudan University, Harbin Institute of Technology, Shanghai Jiaotong University, Soochow University, It is composed of Dalian University of Technology, Shanxi University, and Jingdong Research Institute.

The Links:   FP50R12KT3 LQ035Q7DH06

Related Posts