[특집 기사] 카글, ‘Kaggle Benchmarks’ 출시…AI 평가의 새로운 패러다임 연다

사회부 0 22 17시간전

AI 평가를 모두에게 – 신뢰할 수 있는 기준, 누구나 활용 가능한 도구로 확장

[한국유통신문= 김도형 기지= 2025년 7월 22일, 세계 최대의 데이터 사이언스 커뮤니티인 카글(Kaggle)이 혁신적인 AI 평가 플랫폼 "Kaggle Benchmarks"를 공식 출시하며 생성형 AI(GenAI) 시대의 새로운 평가 기준을 제시했다. 해당 플랫폼은 개인 연구자부터 대형 AI 연구소까지 누구나 신뢰 가능한 벤치마크(Benchmark)를 손쉽게 만들고 사용할 수 있도록 설계되어, AI 연구의 민주화를 촉진한다.

15년간 축적된 신뢰의 평가 기준을 모두에게

카글은 그동안의 AI 경진 대회를 통해 ‘신뢰할 수 있는 AI 평가’의 표준을 구축해왔다. 이제 그 풍부한 경험과 인프라를 바탕으로, 누구나 공정하고 체계적인 AI 모델 평가를 수행할 수 있도록 Kaggle Benchmarks를 공개한 것이다.

초기 플랫폼에는 이미 70개 이상의 벤치마크 리더보드가 탑재돼 있으며, MMLU, MathVista, SciCode, LiveCodeBench, DeepMind의 FACTS Grounding, 메타의 Multiloko Benchmark 등 세계 정상급 연구기관의 벤치마크를 포함하고 있다.

"모델을 공정하게 비교하고 표준화된 방식으로 평가하는 건 쉬운 일이 아닙니다. Kaggle의 전문성과 새로운 사설 테스트셋 기능은 이 분야의 판을 바꿀 수 있습니다."

— 디유크 후프케스 (Meta AI 연구원, Multiloko 벤치마크 제작자)

복잡한 인프라는 카글이, 연구자는 벤치마크에 집중

Kaggle Benchmarks는 단지 리더보드를 호스팅하는 데 그치지 않는다. 사용자는 직접 맞춤형 태스크를 설계해 세계 유수의 AI 모델(Large Language Models, LLMs)들과 비교·평가할 수 있다. 이러한 과정은 별도의 세팅이나 비용 없이 클라우드 기반 인프라에서 자동으로 처리된다.

카글은 컴퓨팅 자원, 하드웨어, 평가 스크립트 실행 등 복잡한 기술적 요소를 완전히 지원하여, 사용자들은 오직 **‘좋은 평가 기준을 설계하는 데 집중’**할 수 있게 한다.

"좋은 사전학습 모델의 개발 비용이 높아지면서 현장의 목소리는 점차 줄어들고 있습니다. Kaggle Benchmarks는 인프라 제공과 커뮤니티 중심 평가를 통해 이 흐름을 되돌리고 있습니다."

— 루카스 하스 (DeepMind 제품 관리자, FACTS 공동 제작자)

ICML 2025에서 선보인 'AI 전문가 벤치마크'

카글은 지난 ICML 2025 학술대회에서 참가자들과 함께 흥미로운 도전을 진행했다. “당신이 가장 힘들다고 생각하는 AI 태스크는 무엇인가요?”라는 질문 아래, 글로벌 AI 전문가들이 직접 태스크를 제안하고 이를 최신 모델들—Gemini 2.5 Pro, Claude 4 Sonnet, DeepSeek-R1 등—에 자동 평가하도록 하였다.

그 결과로 탄생한 ‘ICML AI Experts Benchmark’는 30여 개의 실제적이고 창의적인 평가 태스크가 수록된 리더보드로, 현재 kaggle.com/benchmarks에서 누구나 확인할 수 있다.

프리뷰 기능 + 맞춤형 벤치마크 생성, 누구나 가능

Kaggle Benchmarks는 현재 프리뷰 형태로 맞춤형 벤치마크 작성 기능을 개방 중이다. 앞으로는 누구나 자신만의 다중모달 평가 태스크를 만들고 커뮤니티에 공유하거나, 공개된 라이브러리에서 태스크를 불러와 쉽게 테스트할 수 있게 된다.

또한, 벤치마크 제작자와 평가 전문가가 모델을 ‘극한 상황’에서 테스트하면서 명성을 확보할 수 있는 새로운 형태의 경쟁 환경도 준비 중이라고 한다.

참여 방법 및 향후 계획

베타 프로그램 참여: http://goo.gle/kaggle-benchmarks-waitlist를 통해 얼리 액세스 신청 가능

벤치마크 협력 제안: kaggle-benchmarks@google.com으로 문의

플랫폼 탐색: kaggle.com/benchmarks

향후 AI 평가의 방향을 함께 만들다

카글의 제품 책임자 Meg Risdal은 이를 다음과 같이 요약했다.

“우리는 커뮤니티와 함께 AI의 미래 평가 방식을 정의하려 합니다. Kaggle Benchmarks는 시작일 뿐, 앞으로도 더 많은 기능과 벤치마크가 커뮤니티의 손에 의해 만들어질 것입니다.”

전 세계 AI 연구자가 공통의 기준으로 협업하고 공유하는 시대, Kaggle Benchmarks는 그 중심축으로서 AI 신뢰성과 공동 발전을 이끄는 핵심 허브로 자리매김할 전망이다.

Extend AI evaluation to everyone – reliable standards, tools that anyone can use

[Korea Distribution Newspaper = Kim Do-hyung Base = On July 22, 2025, Kaggle, the world's largest data science community, officially launched "Kaggle Benchmarks," an innovative AI evaluation platform, setting a new standard for evaluation in the era of Generative AI. The platform is designed to make and use benchmarks that are reliable for anyone from individual researchers to large AI labs, facilitating democratization of AI research.

The criteria for evaluating trust accumulated over the past 15 years are shared with everyone

Through its AI competitions, Kaggle has established a standard for 'reliable AI evaluation'. Now, it has unveiled Kaggle Benchmarks so that anyone can perform a fair and systematic AI model evaluation based on its rich experience and infrastructure.

The initial platform already has more than 70 benchmark leaderboards and includes benchmarks from world-class research institutes such as MMLU, MathVista, SciCode, LiveCodeBench, DeepMind's FACTS Grounding, and Meta's Multiloko Benchmark.

"It's not an easy task to compare models fairly and evaluate them in a standardized way. Kaggle's expertise and new private testing set capabilities can transform the landscape."

— Dewk Hoopkes (Meta AI researcher, Multiloko benchmark producer)

Complex infrastructure is focused on cargels, researchers focus on benchmarks

Kaggle Benchmarks doesn't just host the leaderboard. Users can design their own customized tasks and compare and evaluate them with the world's leading AI models (LLMs). This process is automatically processed by cloud-based infrastructure without any separate settings or costs.

Cargle fully supports complex technical elements such as computing resources, hardware, and execution of evaluation scripts, allowing users to **'focus on designing good evaluation criteria'**.

"As the cost of developing a good pre-learning model increases, voices in the field are gradually diminishing. Kaggle Benchmarks is reversing this trend with infrastructure delivery and community-focused evaluation."

— Lucas Haas (DeepMind Product Manager, FACTS Co-Producer)

'AI Expert Benchmark' presented at ICML 2025

Cargle ran an interesting challenge with participants at the last ICML 2025 conference. Under the question, "What AI task do you think is the most challenging?" global AI experts presented their own tasks and automatically evaluated them on the latest models—Gemini 2.5 Pro, Claud 4 Sonnet, DeepSeek-R1, and more.

The ICML AI Experts Benchmark, which was created as a result, is a leaderboard containing more than 30 practical and creative evaluation tasks and can be found by anyone on kaggle.com/benchmarks .

Preview + Custom Benchmark Generation, Anyone

Kaggle Benchmarks is currently opening a customized benchmark creation function in the form of a preview. From now on, anyone can create their own multi-modal evaluation tasks and share them with the community, or easily test them by fetching them from an open library.

It is also said to be preparing a new form of competitive environment in which benchmark makers and evaluation experts can secure their reputation while testing their models in "extreme conditions."

How to participate and plan for the future

Participate in beta programs: Early access can be applied through http://goo.gle/kaggle-benchmarks-waitlist

Benchmark cooperation proposal: Contact kaggle-benchmarks@google.com

Explore the platform: kaggle.com/benchmarks

To set the direction of AI evaluation together in the future

Meg Risdal, head of product at Cargle, summed it up as follows.

"We want to work with the community to define how AI evaluates the future. Kaggle Benchmarks is just the beginning, and more features and benchmarks will be created by the community in the future."

In an era where AI researchers around the world collaborate and share on a common basis, Kaggle Benchmarks is expected to establish itself as a key hub for AI reliability and joint development as a central axis.

기사제보 및 사회적 공헌활동 홍보기사 문의: 010-3546-9865, flower_im@naver.co

검증된 모든 물건 판매 대행, 중소상공인들의 사업을 더욱 윤택하게 해주는

#Kaggle #KaggleBenchmarks #AI평가 #AI벤치마크 #생성형AI #LLM #머신러닝 #인공지능연구 #AI리더보드 #ICML2025 #AI모델비교 #오픈AI벤치마크 #메타AI #데이터사이언스 #AI개발자 #카글 #GenAI #AI윤리 #AI기준 #딥러닝

Comments

로그인한 회원만 댓글 등록이 가능합니다.

다음 목록