Computer Vision

The field of AI focused on enabling machines to interpret and understand images and video.

Computer vision is the branch of AI that teaches machines to analyze visual data such as images, video, medical scans, and camera feeds. Tasks include object detection, image classification, segmentation, OCR, tracking, and scene understanding.

It has applications across healthcare, manufacturing, autonomous vehicles, retail, security, robotics, and consumer apps. Modern computer vision systems are powered by deep neural networks, especially convolutional models and transformers.

Goal: turn raw pixels into useful understanding, whether that means recognizing a face, reading a document, or analyzing a live video stream.

Common Computer Vision Tasks

Classification — assign a label to an image
Detection — find and locate objects within an image
Segmentation — label regions or pixels precisely
OCR — extract text from images and scans

Computer vision is increasingly blending with language models to create multimodal AI systems that can both see and explain what they see.

Related Terms

← Back to Glossary