Sep 9, 2024

Browser automation: Going beyond the limits of LLMs

Image for Browser automation: Going beyond the limits of LLMs

Browser automation: Going beyond the limits of LLMs

Humans have evolved to seek efficiency, not just for convenience but also for survival and prosperity—and this innate nature has been the driving force behind our inventions and advancements. We want to use our limited resources to better use like on solving more challenging problems and engaging in creative activities instead of doing rote tasks.


When generative AI first emerged, there was much hype about how it could automate complex and creative tasks and even revolutionize work. But it turned out it was still too early for genuine transformation, as LLMs don’t yet have access to the most current data and produce hallucinations, requiring users to verify the information for accuracy.


According to the Gartner Hype Cycle in 2024, generative AI is entering the real-world application phase, having “passed the peak of inflated expectations.” The initial excitement is waning, and its integration with existing technologies is expanding AI’s capabilities and business applicability.

In this post, we’ll explore how combining the increasingly popular browser automation with AI can bring positive changes across businesses.


Generative AI: A breakthrough or not?

Corporations have yet to realize the full potential of generative AI and meet market expectations. Many factors hinder AI’s growth, including the technological gap between different nations and corporations that we mentioned in our earlier posts on sovereign AI.


The most important reason, however, is that we cannot wholly trust AI-generated information. Sometimes, AI produces factual inaccuracies or hallucinations due to a lack of domain-specific training data and contextual understanding. The likelihood of giving wrong answers increases when LLMs are asked about relatively unknown scientific theories or complex questions. LLMs rely on pre-trained data, meaning they cannot provide answers on topics they did not learn or generate up-to-date responses.

Humans are encouraged to check the validity of AI-generated content, and human judgment must be supplemented when making important decisions. If we are to realize AI’s full potential, we must overcome these obstacles.


Browser automation

Browser automation merges web search and AI technologies, allowing AI models to automatically browse the web to collect, process, and provide information to users. Picture an AI searching the web to gather data, just as a person would with a mouse and keyboard. Browser automation differs from web scraping or crawling in that it imitates the behavior of actual users.


The most prominent feature is that it automates browser interactions like mouse clicks, scrolls, and keyword searches. It enables access to web pages that require logins and processes dynamic content, taking web search technology to new territories.


The advancement of AI and natural language processing capabilities makes browser automation more appealing, and its importance is only amplified by the growing need for real-time data collection and automated data processing. Browser automation can help improve data handling efficiency and user experience and is expected to make an impact across different fields.


LLMs and browser automation

If you’ve talked to an AI chatbot, perhaps you’ve received a response saying it does not provide information that requires browsing the web. Imagine how convenient it would be if someone searched the data for you in real time and output the response in the format you want. Browser automation can ground LLMs on the most up-to-date information, significantly reducing the chances of inaccuracies. They can also perform an integrated analysis of multiple web pages or take in the structure and content of a particular page to generate more specific and structured answers. The convergence of the two technologies could be the key to overcoming the limitations of AI and increasing its trustworthiness.

 

Why use browser automation?

  • Understand and analyze users’ questions:Browser automation uses chain-of-thought (CoT) prompting. This method analyzes the intent and content of the question and explains how it arrives at an answer step-by-step. It works well in parallel search and complex reasoning.
  • Perform real-time search and analysis:Browser automation helps LLMs find and fetch the most up-to-date information from the web. This technique could be useful for monitoring competitors and market trends.
  • Conduct an integrated analysis: With browser automation, LLMs can analyze data from different sources to provide accurate and comprehensive information. For example, they can consolidate data from various travel websites, offering pros and cons of tourist attractions by reading and analyzing user reviews. This method is also helpful for product enhancements and market analysis.
  • Provide structured information:Browser automation enables LLMs to analyze the structure of an entire web page to generate more informative responses. When you ask for the latest information on Apple products, the models can query the official website, identifying and extracting information from different sections, including benefits, specifications, and pricing. It can also recognize important headlines like upcoming releases.
  • Respond to dynamic websites: Browser automation allows LLMs to fetch content from JavaScript-rendered web pages or pages that require users to log in, connecting the models to more diverse sources.


RPA, RAG, and browser automation

We have always strived to automate repetitive tasks. In this section, we’ll take a look at how technology has evolved to reach new heights with browser automation.


  • Robotic process automation (RPA) : The emergence of generative AI was preceded by robotic process automation. RPA automates repetitive processes by employing robots powered by rules-based software. For example, banks leverage RPA to extract data from online loan application forms, enter them into their internal systems, and perform initial evaluations according to their standards. However, because RPA operates according to pre-defined rules and scenarios, it struggles with exceptions and unstructured data and cannot understand or process natural language.
  • Retrieval-augmented generation (RAG) : The advent of generative AI led to the development of a complementary technology called retrieval-augmented generation as a way of countering hallucinations. When users ask a question, RAG retrieves relevant documents and databases to generate a response. Let’s say you’re using RAG in the field of law. AI can reference statutes and precedents to offer factual legal advice. In other words, RAG brings search capabilities to LLMs—the AI searches for relevant information from a pre-built database before answering user queries. Once data is retrieved, it must be processed again to generate responses. RAG retrieves information from a database built in advance with a company’s internal manuals and reports or reliable external sources like academic papers. This ensures that the information is trustworthy and consistent, but depending on how often the database is updated, the answers might be outdated or limited to certain domains. This makes RAG especially useful for in-depth, domain-specific knowledge. For example, it can be used to build AI services across fields like law and medicine, where data precision is paramount, or for dealing with a company’s internal policies or confidential information. Because search capabilities are added to an existing LLM, you can still leverage the pre-trained model to extend features.
  • Browser automation : Recently, the trend has moved from RAG to browser automation. The latter technique incorporates data search and retrieval capabilities into LLMs. It offers everything RAG does with the added benefit of providing the most current data. LLMs keep their sources up-to-date by searching the web for previously undefined data, though additional validation may be necessary because not everything on the web can be trusted. There are also privacy and copyright issues that need to be addressed. Initial design costs may be high, but browser automation lets you tap into more advanced automation and real-time processing features.


Browser automation and RAG

Think of LLMs as an encyclopedia that encompasses comprehensive but static knowledge, where browser automation can help update and expand this information. RAG is like a college textbook; it allows you to dive deeper into a specific topic and gives authoritative answers.

n  LLMs cover a broad range of knowledge but may be constrained in depth, consistency, and currency.

n  RAG can provide domain-specific and reliable information, but its coverage may be limited.

n  Browser automation allows an LLM to access the most current facts, but you may have to verify the factual accuracy of its answers.

Browser automation and RAG seem to have a lot in common, but they each have different strengths. In real-world settings, the two technologies often complement each other. For example, we rely on RAG for knowledge but on browser automation for the latest trends and real-time data. Instead of comparing and contrasting the advantages and disadvantages of each technology, you should consider what kind of data you need, how important data accuracy is, and how often data should be updated to adopt either one or both technologies. Both can help create a safer and more efficient LLM.


[Case study] NAVER’s Project CONNECT X

Workers spend an average of 61% of their time on communication during work. This means we spend more than half of our time locating information from the intranet, writing emails, and joining meetings. NAVER has recently rolled out Project CONNECT X, a solution built on HyperCLOVA X that increases work productivity. With Project CONNECT X, we aim to streamline and automate work processes by focusing on data.


LLMs cannot process data like a company’s policies or archived data on a digital bulletin board. You can, however, leverage RAG to incorporate these company-specific data into your AI system. RAG enables LLMs to access internal data and produce precise answers that are relevant to your company.


Our Project CONNECT X team consolidated data scattered across different internal knowledge bases in one place and then created an efficient search system and assistant to help employees. The assistant recommends tasks based on emails and messages, writes email drafts, and summarizes previous conversations. At NAVER, we’ve noticed that it has boosted productivity for many of our employees since it was released last November.


One drawback is that RAG can only understand structured data—the type of data it can learn is limited and can only provide answers for simple searches. By incorporating browser automation, we were able to broaden the range of data it can process. We’re also considering creating a specialized model that converts unstructured data to an understandable format so we can introduce an auto-task feature.

Browser automation can help AI models decide which steps to take to answer a user’s question and browse like an actual person. It can make complex searches, compare search results, and perform inference and analysis, all of which can improve productivity in the workplace.


Project CONNECT X has proven that RAG can retrieve data from databases that are regularly updated to reduce the possibility that a model will generate wrong or outdated responses. It cannot, however, connect to external sources. To make up for this lack, you can use browser automation to increase the pool of data and drive productivity.


Sovereign AI and browser automation

The basic definition of sovereign AI is a country having autonomy and control over developing and operating its AI systems. However, a more comprehensive and realistic way of looking at it is to see whether AI models reflect local cultural and social norms. Using an LLM developed in another country as a base model to build sovereign AI could lead the model to produce answers that do not sufficiently reflect the nation’s language and culture. This problem likely occurs in countries that:


  1. are small or were late to digital transformation and, therefore, have a hard time obtaining large amounts of high-quality data;
  2. have closed economies or political systems, and lack global capabilities because they must rely on local data only;
  3. don’t have technological resources, and cannot afford the massive computing power and specialized workforces required to train an LLM. This is especially true for poor economies.

In all of these cases, browser automation may be a practical approach. Combined with sovereign AI, browser automation can help even LLMs without sufficient local knowledge keep up-to-date. It goes beyond simply accumulating data to generating more precise responses in the regional context. Harnessing browser automation can help bridge the technological gap between nations and help preserve their cultural identity while still allowing them to participating in the broader global ecosystem.


Conclusion

New technologies open doors to remarkable possibilities, but real transformations have always taken place through convergence with existing technologies. Here, we introduced browser automation as a complementary technology to LLMs that has the potential to transform our lives.

LLMs are built on data that’s being constantly accumulated even at this very moment, and data is also core to RAG and browser automation. This means that humanity’s collective wisdom, which knows no time or space, allows us to live a free and convenient life. Therefore, it is our responsibility to develop technologies that benefit people and not take them lightly. Launching Project CONNECT X and leveraging browser automation for sovereign AI are examples of our efforts on this front.

AI can exert more positive influence when infused with different technologies. We believe NAVER and other tech companies should continue to work to create AI that is beneficial to humanity.