What is computer vision? How can you integrate computer vision capabilities into your applications and workflows using Microsoft Azure? In this post, we’ll explain that and more with an approach designed for visual learners.
About visual guides
65% of us have a visual learning ability, which means we can absorb information faster from images and can retain and recall it longer as a result. Visual guides are high-resolution (poster-sized) images that summarize a topic or content resource using a combination of text and illustrations. You can think of them as sketchnotes (visual notes) that provide a “big picture” view of the topic at the start of a learning journey, helping you make connections and identify patterns that improve your understanding, recall, and retention, of what you learned.
Want to discover other visual guides or be notified of new ones? Follow @SketchTheDocs on Twitter.
What is Computer Vision in Microsoft Azure?
The visual guide to Computer Vision in Azure draws on two main resources: the Microsoft Learn module of the same name, and the Microsoft Docs page for Computer Vision under Cognitive Services. You can download a high-resolution (poster-sized) version here.
This guide is best used to bookend your learning journey. Use it as a pre-read (to prime your mind with relevant terms and workflow) before diving into hands-on exercises that reinforce concepts with code. Then, use it as a post-review resource to test your recall and identify gaps in coverage or understanding. Or just print it and hang it on the wall — or use it as desktop wallpaper. Treat it as a handy, glanceable reference that can supplement your learnings from other sources. Now let’s dive into technology!
Computer Vision and Azure Cognitive Services
Computer vision is an area of artificial intelligence where software systems are designed to perceive the world visually using cameras, images, and video.
The challenge here is that humans and computers see different things when they look at the same object. Where a human sees an apple (object), a machine sees an array of pixel values (image color data). To give machines a higher-level understanding of what the image data represents, we use pixel values as numeric features to train a machine learning model.
This model behaves like a pattern-detection function, mapping computer-friendly features (pixel values) into human-friendly labels (objects, attributes) in a probabilistic manner. When we feed an input image to this model, it can now predict a relevant label with an associated confidence value. In some sense, we have taught the computer to “see” the image the way humans would.
- Azure Computer Vision – to use pre-existing advanced image analysis algorithms.
- Azure Custom Vision to build, improve, and deploy, your own image classifiers.
- Face – to use pre-existing advanced face algorithms to detect and recognize human faces.
Azure Computer Vision is a cloud-scale service that provides access to a set of advanced algorithms for image processing. Given an input image, the service can return information related to various visual features of interest. Based on your primary goal, you can explore this service through these capabilities:
- Optical Character Recognition to extract information from printed or handwritten text in image.
- Image Analysis to extract visual features like tags, colors, faces, objects, logos, etc. in image.
- Spatial Analysis to understand people’s presence or movements in a space from video.
The visual guide (and associated learning path) explores the following subset of exercises:
|Analyze Images and Read Text||Azure Computer Vision|
|Classify Images and Detect Objects in Images||Azure Custom Vision|
|Detect and Analyze Faces||Face|
Get the Azure skills needed for a better career.
Azure Applied AI Services
A quick glance at the visual guide shows there’s a sixth application scenario — Analyze Receipts with Form Recognizer. But there was no such services listed under Azure Cognitive Services. So where does that fit into the Azure machine learning services ecosystem?
The answer lies in a new product category unveiled in May 2021 at Microsoft Build: Applied AI Services. The objective is to accelerate time-to-value for AI adoption by building on Azure Cognitive Services but also combining technologies with task-specific AI or business logic that is tailored to a specific use case. The result is an out-of-the-box AI solution to address common business challenges without requiring the developer to programmatically wire these up every time. However, since they build on Azure Cognitive Services, developers always have the option to create comparable custom solutions from scratch.
At present, there are six Applied AI Services options including:
- Form Recognizer – automate structured data extraction and entry from images and documents.
- Metrics Advisor – perform data automation and anomaly detection in time-series data.
- Cognitive Search – cloud-scale search with built-in AI capabilities to search all types of content.
- Immersive Reader – inclusively designed tool to improve reading comprehension for all learners.
- Bot Service – rapidly create customizable conversational experiences with prebuilt components.
- Video Analyzer – build automated apps powered by video intelligence.
Using Computer Vision in Microsoft Azure
The visual guide is structured to match the six examples (modules) provided by the learning path. In this section, we’ll explore each application briefly, and set the stage for diving into the hands-on exercises.
1. Analyze Images with Computer Vision Service
This module focuses on the core value proposition of the Computer Vision service — image analysis. With this service endpoint, your application (client) submits an image and gets back detailed information on the various visual features (and attributes) found in it. Clients can also perform a range of tasks related to image processing. Examples of things you can do include:
- Generate captions: get a human-friendly description of your image (useful for alt-text)
- Tag visual features: get attributes that can serve as metadata for the image
- Detect objects: think tags, but with location of identified objects (bounding box coordinates)
- Detect brands: think specialized object detection for commercial logos (reference database)
- Detect faces: think specialized object detection for human faces (predict age, identify celebrity)
- Categorize image: classify image using parent-child hierarchy (limited set of category options)
- Detect domain-specific content: supported domain models include landmarks and celebrities
- Optical Character Recognition: read text from printed or handwritten content areas in image
Note that this is a basic image analysis service for human faces. For advanced face algorithms, you can directly use the Face Service endpoint of Azure Cognitive Services and perform more complex tasks such as detecting emotions, head poses, or presence of face masks.
As developers, start by creating the relevant resource — you have two choices. Use the Azure Computer Vision resource (targeted) if you plan to use only image analysis capabilities or want to track costs and utilization of each cognitive service individually. Use the Azure Cognitive Services (broad) resource if you plan to use many cognitive service capabilities and want the convenience of managing them together. For hands-on code tutorials for image analysis usage, start here.
2. Classify Images with the Custom Vision Service
This module focuses on the core value proposition of the Custom Vision Service — image classification. This is a learning technique where you provide the machine with training data (images and associated classes) and train it to detect and uncover patterns that link the numeric features (pixel data) to human concepts (class labels). The trained model can be published to expose a service endpoint to clients. Using this service, clients and post an image and get back a predicted class (with an associated confidence score).
With the Custom Vision Service, you can train your image classifiers by uploading training data in one of two ways: using the portal (code-free UI-based workflows) or using the SDK or REST API (a code-first approach). Usage involve two steps: training (create model) and prediction (publish model). As before, you can use either the dedicated Custom Vision Service resource, or a general-purpose Azure Cognitive Services resource, for either — or both — phases. You can even mix and match them as desired. For hands-on code tutorials for image classification usage, start here.
3. Detect Objects with Custom Vision Service
This module focuses on creating custom models for object detection. Typically, this would require advanced knowledge of deep learning techniques and a large training dataset — but using Custom Vision Service lets us achieve this with fewer images without data science expertise. Similar to the steps taken in the previous Custom Vision Service example above, this involves preparing your training image set, uploading the data to Azure (via the portal or using the SDK), training and validating the model — then publishing it to a service endpoint for client usage.
The key difference is that object detection involves identifying the location of objects in the image along with its classification. This means the training set (images) need to be prepped to identify the bounding boxes (coordinates) of the objects, which can be time-consuming. With Custom Vision, you can upload the images to the portal and get suggestions for areas where objects are detected; simply drag or adjust the bounding box area to improve the accuracy. Once you have an initial set trained, try a smart tagging approach by using the Azure Computer Vision service to suggest tags and bounding boxes for the rest. For hands-on code tutorials for object detection usage, start here
4. Detect and Analyze Faces with Face Service
This module focuses on using advanced algorithms for facial analysis that beyond the basic attributes obtained in Azure Computer Vision. It’s like a special case of object detection where the object of interest is a human face. With Face algorithms, you can perform face detection (return regions of image containing human faces), face analysis (return facial landmarks like locations of nose, eyes, eyebrows, lips, etc.) and face recognition (for face-based authentication applications as an example).
Azure Cognitive Services provide different options for detecting and analyzing faces — Computer Vision for basic analysis (age), Video Indexer for face analysis in video content, and Face for the widest range of face analysis capabilities.
The Face service can detect, identify and verify human faces. It can find other similar faces, or group faces by similarity. The Face Service analysis return attributes, including age, emotion, facial hair, glasses, hair, head pose, makeup, occlusion, and blur and exposure of image with respect to detected faces. For hands-on code tutorials for facial analysis usage, start here
5. Read Text with the Computer Vision Service
This module focuses on the Optical Character Recognition capabilities of the Azure Computer Vision service to read printed and handwritten text in images. There are two APIs you can use for this based on the volume of text involved: the OCR API and the Read API.
6. Analyze Receipts with the Form Recognizer Service
This module focuses on a more applied AI solution that combines the OCR text-reading capabilities with domain specific predictive models for interpreting form data, enabling intelligent form processing and automation workflows for documents like receipts and invoices.
The Form Recognizer provides both a pre-built receipt model and support for custom models. The pre-built model is trained to recognize common English-based receipt formats popular in the U.S. region. It extracts and returns attributes like the time/date of transactions, merchant information, taxes and totals paid, etc. By contrast, the custom model recognizes and extracts the key/value pairs and table data in the analyzed document. It can be trained using your own data, tailoring returned attributes to match the structure and context of your forms, with basic training requiring at least 5 form samples. For hands-on code tutorials for receipt analysis usage, start here
Get the Cloud Dictionary of Pain
Speaking cloud doesn’t have to be hard. We analyzed millions of responses to ID the top terms and concepts that trip students up. In this cloud guide, you’ll find succinct definitions of some of the most painful cloud terms.
Summary and next steps
This was a quick review of the six-module learning path and the downloadable visual guide that provides a “big picture” quick reference to complement hands-on exercises in your learning journey into Computer Vision in Azure. Want to keep going? Here are resources that can help:
- Microsoft Learn | An evolving Collection of relevant Docs, Learning paths, and modules.
- Microsoft Docs | Cognitive Services, Computer Vision, Custom Vision, Face | Applied AI
- Visual Guides | Visit @SketchTheDocs for news, and Cloud-Skills.dev and SketchTheDocs for content.
Also, check out these ACG courses and Hands-On Labs:
- Course: Azure AI Components and Services
- Course: Getting Started with Azure Machine Learning Studio
- Hands-On Lab: Creating a Cognitive Services Resource Using the Azure Portal
About the author
Nitya Narasimhan is a PhD in Computer Engineering, with 20+ years of software research & development experience spanning distributed & ubiquitous computing, mobile & web application development. She is currently a Cloud Advocate in the Microsoft Developer Relations team where she spends her time on mobile and cross-platform development (for Azure and Microsoft Surface Duo), visual storytelling, and supporting our amazing developer communities. She’s one of ACG’s 21 Azure builders to follow.