Generative AI is rapidly reshaping our lives—expanding human capabilities and transforming how we work. But amidst discussions centered on growth and performance, we risk overlooking a critical question: who is AI actually reaching, and how?
One notable trend is inclusive AI: extending AI’s benefits to those excluded under previous technological paradigms, including older adults, children, people with disabilities, and patients.
But we can push this further. Has inclusive AI been approaching these users’ lives from diverse enough perspectives? Or has it stopped at addressing only fragments of their experiences, framing them as passive recipients of assistance? What would it mean to move beyond daily support toward enabling real, meaningful change—and what should AI look like then?
We started from these questions to redefine inclusive AI and propose a benchmark for evaluating it.
What is an inclusive AI agent?
AI can do many things for the groups typically associated with inclusive AI. For lonely seniors, it becomes a companion; for children with autism, a tool for communication. But AI can do much more: it can provide real, on-site support in the places where these individuals contribute as members of society—the foundation of their working lives.
In reality, many people in these groups actively participate in social and economic activities. If we truly hope to make a difference for them, shouldn’t we expand the problem beyond daily conveniences to enabling them to realize their full capabilities at work?
Some may already be working in industrial settings but have limited opportunities to work with AI or face barriers in leveraging it effectively. Our goal is to include these people and help them grow their capabilities. We call the AI that plays this role an inclusive AI agent. This agent must do more than provide simple information—it must help solve the problems people actually face, opening new paths for AI to be leveraged more meaningfully across broader industrial fields.
Upending the way we look at industry
Defining an inclusive AI agent this way requires rethinking how we view industry itself—interpreting it from an inclusive perspective. This is where our approach diverges from traditional generative AI evaluation.
In September 2025, OpenAI introduced GDPval, a benchmark for evaluating generative AI on economically valuable, real-world tasks. They selected nine industries contributing over 5% to U.S. GDP and identified representative occupations to assess whether generative AI could perform those tasks.
[Figure 1]: 44 occupations across 9 industries included in GDPval
(Source: GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks)
This approach frames industry in terms of economic contribution and growth potential. But what if we view it through the lens of inclusion?
We shifted our focus from “economically valuable industries” to “the people working in them“—asking where the workers who could actually benefit from an inclusive AI agent are. We then narrowed the scope to older workers (aged 50 and above) currently employed in these occupations, examining Korea’s industrial landscape from this perspective.
Inclusive AI agent benchmark for older workers
Figure 2 ranks industries by the percentage of workers aged 50 or older, based on Q2 2024 data—revealing the significant share older workers represent across Korea’s industries. Notably, aside from household production activities (which cannot be classified as a specific industry), many top-ranking fields are ones GDPval did not adequately address.
[Figure 2]: Industry ranking by percentage of workers aged 50+ (Q2 2024)
(Source: Ministry of Data and Statistics, Economically Active Population Survey)
Industries like finance and insurance, professional, scientific, and technical services, or publishing, broadcasting, telecommunications, and information services don’t rank near the top. Instead, agriculture and fishery, mining, construction, transportation, sewage and waste treatment, recycling, and environmental restoration dominate—sectors not typically seen as the growth engines of an advanced economy.
This distribution is telling. Older workers are actively employed across industries, but they are disproportionately concentrated in sectors where technological benefits are harder to access. The implication is clear: we need to identify which fields an inclusive AI agent should serve, what tasks it should perform, and how benchmarks for evaluating model performance on these tasks should be designed.
The role of inclusive AI agents
Building on our analysis of industry from an inclusive perspective, we set out to define the specific role an inclusive AI agent should play. We selected industries first, then identified real-world problems older workers face in each.
We didn’t simply list existing tasks. Instead, we examined the systems older workers interact with and pinpointed where in the process they encounter difficulty.
One key insight emerged: inclusive AI agents—and the benchmarks used to evaluate them—must go beyond generative AI that merely provides information. The essential capability is agentic tool use, where the model integrates with external systems to call tools and execute tasks.
Our guiding question was: “If an AI agent replaced communication with existing systems, would it genuinely help older workers?” With this criterion, we selected the following industries and tasks.
We chose four industries with over 1 million workers aged 50 and above or where this age group accounts for nearly half the workforce (Table 1), then identified a representative task older workers face in each (Table 2).
- Agricultural machinery troubleshooting
Equipment breakdowns are among the most common challenges in agriculture. While the fixes themselves are often straightforward, they can pose a significant burden for older workers. Local governments have even established telephone consultation services to assist with machinery issues—a sign of just how widespread the problem is.
- Care worker task management
This task reflects a pressing reality for care workers in Korea: over 80% are aged 50 or older, and their work requires entering various records and reports into digital systems. Many struggle with these systems, prompting us to explore whether an inclusive AI agent could offer genuine support.
- Convenience store product management
Many convenience store owners are retirees opening stores later in life. Core tasks like product ordering, inventory checks, and returns processing are all managed through digital systems— a comprehensive management load that can quickly become overwhelming for older store owners.
- Freight truck dispatch management
Over 70% of freight truck drivers in Korea are aged 50 or older, and many work as independent operators. Dispatch and route management are directly tied to revenue—an area where inclusive AI could make a real difference.
With industries and tasks identified, the next challenge was capturing the real-world conditions an inclusive AI agent would face when performing these tasks—and reflecting that in evaluation.
Reflecting user characteristics in the benchmark
Inclusive AI agents interact directly with older workers on-site to perform tasks. This means we need to design an evaluation environment that reflects actual conversational conditions. To do this, we analyzed conversational data involving older adults along with relevant research, identifying the following characteristic patterns:
- Failing to provide necessary information at the right time
- Forgetting terms and expressions mid-conversation
- Going off-topic due to unnecessary context
These characteristics aren’t limitations to work around—they represent the environment inclusive AI agents will actually face. To account for this when designing an agent, we need to redefine how models are evaluated: not just on whether they complete tasks, but on how well they respond to these real-world conditions.
This led us to a key conclusion: to properly evaluate inclusive AI agents, we shouldn’t only assess final results—we must also evaluate the process of reaching them. For example, the following factors become important evaluation criteria:
- Extracting information naturally from conversation and prompting the user when necessary details are missing
- Inferring user intent and solving problems even when expressions are ambiguous
With this in mind, we created a benchmark that reflects these characteristics—capturing the problems and conversations that inclusive AI agents will actually encounter.
From design to validation: The next step toward a benchmark for inclusive AI agents
In this first part, we reinterpreted industry from an inclusive perspective and explored the role of an inclusive AI agent focused on older workers—and how to design a benchmark for evaluating it.
In the next part, we’ll examine the work environments and limitations older workers face on-site, configure a benchmark based on these realities, and share our findings on how effectively different AI models respond.
We hope this effort serves as one small piece in the puzzle of making AI a more natural and meaningful part of our lives.



