Core Concepts
Computer Vision
The field of AI focused on enabling machines to interpret and understand images and video.
Computer vision is the branch of AI that teaches machines to analyze visual data such as images, video, medical scans, and camera feeds. Tasks include object detection, image classification, segmentation, OCR, tracking, and scene understanding.
It has applications across healthcare, manufacturing, autonomous vehicles, retail, security, robotics, and consumer apps. Modern computer vision systems are powered by deep neural networks, especially convolutional models and transformers.
Goal: turn raw pixels into useful understanding, whether that means recognizing a face, reading a document, or analyzing a live video stream.
Common Computer Vision Tasks
- Classification — assign a label to an image
- Detection — find and locate objects within an image
- Segmentation — label regions or pixels precisely
- OCR — extract text from images and scans
Computer vision is increasingly blending with language models to create multimodal AI systems that can both see and explain what they see.