top of page

What is the future of video analytics in retail?

To understand the future of video analytics in retail, we recommend first reading the history of how computer vision AI was created

Chapter 1: The beginning. A competition called ImageNet

To understand the future of computer vision, it is crucial to go back to its beginning: the ImageNet competition.

An image with the text "IMAGENET", representing thousands of background images
ImageNet, the competition in which researchers competed to identify objects in 14 million different images

ImageNet was an image classification competition that began in 2010. Researchers were trying to create algorithms capable of identifying and classifying objects in 14 million images from more than 20,000 different categories. Competitors had to outperform existing solutions to gain recognition in the community.

Until that time, the capabilities of computer vision were far from equaling human abilities. But there was a discovery that changed everything.

In 2012, a team led by Alex Krizhevsky, along with Geoffrey Hinton and Ilya Sutskever (future co-founder of openAI), revolutionized computer vision by introducing the AlexNet model, which won the ImageNet competition by a large margin, and marked the Beginning of the era of deep neural networks in computer vision.

A group of 3 researchers won the ImageNet competition in 2012 by inventing the first viable deep convolutional neural network architecture

From that point on, AI began to develop at a dizzying pace. In 2015, AI was already outperforming human performance on ImageNet.

A graph with error percentages in the ImageNet competition
Detection error of ImageNet winners over the years. In 2015, performance surpassed that of a human being

Chapter 2: Development. How did computer vision AI evolve in its first 10 years?

After 2012, computer vision went through multiple revolutions, and became an essential component in numerous applications and sectors. From facial recognition systems to AI-assisted medical diagnosis to autonomous driving, computer vision AI has transformed the way we interact with the world and expanded the capabilities of machines to understand and process visual information.

Visualization of an autonomous driving system based on computer vision
Autonomous driving systems based on computer vision

Chapter 3: The Transforms. A second revolution within AI

In 2017, Google researchers once again revolutionized artificial intelligence and marked a turning point in computer vision AI: the arrival of the Transformers.

Transformers, originally designed to process natural language, proved to be highly versatile and effective in the task of processing visual data. This led to the ability to transfer their ability to understand natural language to understanding images.

The Transformers allowed the creation of visual reasoning AIs. Visual reasoning is an artificial intelligence similar to chatGPT, but applied to images.
An example of a visual reasoning prompt about a clothing store
Example of visual reasoning analysis of a clothing store

Transformers are in many cases replacing convolutional and recurrent neural networks (CNN and RNN), the most popular types of deep learning models to date.

Starting in November 2022, with the chatGPT milestone, machines became smarter, subsequently merging computer vision with natural language processing in multimodal applications. From this world, computer vision can understand and describe the world in all its complexity.

Zero-shot multimodal AI models They allow adding greater intelligence and understanding of spaces and their context to the existing camera infrastructure, allowing tests to be carried out with great flexibility and speed.

Multimodal AI (which understands vision and language in a complex way) allowed:

  • Flexible functionalities, because the AI is general and does not require specific training for each task

  • Allows you to create new functionalities quickly, because it does not require specific training

  • And it can also be used for complex data analysis and drawing conclusions, because it has high analysis capacity.

So, now knowing the history of the beginnings of computer vision AI... What is the future of video analytics in retail?

Chapter 5: The Future of Video Analytics in Retail

The future of computer vision applications in the retail sector is promising and constantly evolving. The combination of technological advances in computer vision, AI and natural language processing has opened up a world of possibilities for the retail industry. Multimodal development enables the creation of AI agents, "virtual managers" for retail, highly sophisticated artificial intelligence entities that can manage and make decisions related to various business characteristics, such as:

  1. People counting, customer acquisition and sales conversion: The AI agent can evaluate the store's efficiency in capturing and converting customers into sales. Then, you can take actions to detect and fix efficiency problems.

  2. Queue Management and Customer Flow: Computer vision AI agent is used to track customer flow in the store, identify congested areas and help retailers manage efficiently. More efficient staff and resources during periods of high demand, preventing customers from leaving without purchasing.

  3. Operational task manager: Virtual agents are capable of understanding complex situations and contexts, allowing them to efficiently manage cleaning tasks, opening and closing boxes, product replacement, and other common operations within the retail world.

  4. Store analytics and decision making: The AI agent can collect data on customer traffic, product layout, and the effectiveness of in-store promotions. This suggests retailers make decisions to improve product distribution and customer experience.

  5. Personalized shopping experiences: Computer vision systems can identify the clothing a customer is looking at and suggest combinations of clothing or accessories, improving the shopping experience and increasing sales. p>

  6. Inventory Management: Automation in inventory management is a crucial area in retail. AI agents can constantly monitor stock levels in stores and automatically notify when products need to be replenished. This helps avoid sales losses due to lack of stock.

  7. Targeted Advertising and Audience Analysis: AI virtual managers can identify customer demographics, age, gender, and other characteristics to tailor advertising in real time. This ensures that products, promotions and advertisements are more relevant to the target audience.

  8. Customer Behavior Analysis: Computer vision can track and analyze customers' in-store behavior, such as which products they look at most often or how much time they spend in certain areas. This information is valuable for making marketing and store design decisions.

  9. Display optimization: Agents can evaluate the effectiveness of displays and the arrangement of products in the display case. This allows retailers to adjust their display strategy to attract more customers.

Illustration about a virtual AI agent by AI in a clothing store
Illustration about a virtual AI agent by AI in a clothing store

AI virtual agents, "virtual managers" In the retail sector, they are systems that will generate benefits for retail by allowing highly automated activities, which will take advantage of computer vision and artificial intelligence to make decisions based on real-time data. Its impact will be to optimize store management, improve customer experience and increase operational efficiency, ultimately leading to an increase in sales and customer satisfaction.

Do you want to know how KSI's virtual agent can help you optimize your business by increasing your sales and reducing your losses? Contact us here


bottom of page