fbpx
Sigmoidal
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
  • Português
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
Sigmoidal
No Result
View All Result

Fundamentals of Image Formation

In this lesson, you'll learn the theory behind image formation and digital images. This article is the first in a series called Computer Vision: Algorithms and Applications.

Carlos Melo by Carlos Melo
March 22, 2023
in Computer Vision
0
169
SHARES
5.6k
VIEWS
Share on LinkedInShare on FacebookShare on Whatsapp

In a world full of mysteries and wonders, photography stands tall as a phenomenon that captures the ephemeral and eternal in a single moment. Like a silent dance between light and shadow, it invites our imagination to wander through the corridors of time and space. Through a surprisingly simple process, capturing rays of light through an aperture and exposure time, we are led to contemplate photographs that we know will remain everlasting.

The philosopher José Ortega y Gasset once reflected on the passion for truth as the noblest and most inexorable pursuit. And undoubtedly, photography is one of the most sublime expressions of this quest for truth, capturing reality in a fragment of time.

Behind this process lies the magic of matrices, projections, coordinate transformations, and mathematical models that, like invisible threads, weave the tapestry between the reality captured by a camera lens and the bright pixels on your screen.

But to understand how it’s possible to mathematically model the visual world, with all its richness of detail, we must first understand why vision is so complex and challenging. In this first article of the series “Computer Vision: Algorithms and Applications,” I want to invite you to discover how machines see an image and how an image is formed.

The challenges in Computer Vision

Computer vision is a fascinating field that seeks to develop mathematical techniques capable of reproducing the three-dimensional perception of the world around us. Richard Szeliski, in his book “Computer Vision: Algorithms and Applications,” describes how, with apparent ease, we perceive the three-dimensional structure of the world around us and the richness of detail we can extract from a simple image. However, computer vision faces difficulties in reproducing this level of detail and accuracy.

Szeliski points out that, despite advances in computer vision techniques over the past decades, we still can’t make a computer explain an image with the same level of detail as a two-year-old child. Vision is an inverse problem, where we seek to recover unknown information from insufficient data to fully specify the solution. To solve this problem, it is necessary to resort to models based on physics and probability, or machine learning with large sets of examples.

Schematic representing the physical principle of optical remote sensing, through the interaction between the surface, solar energy, and sensor.

Modeling the visual world in all its complexity is a greater challenge than, for example, modeling the vocal tract that produces spoken sounds. Computer vision seeks to describe and reconstruct properties such as shape, lighting, and color distribution from one or more images, something humans and animals do with ease, while computer vision algorithms are prone to errors.

How an Image is Formed

Before analyzing and manipulating images, it’s essential to understand the image formation process. As examples of components in the process of producing a given image, Szeliski (2022) cites:

  1. Perspective projection: The way three-dimensional objects are projected onto a two-dimensional image, taking into account the position and orientation of the objects relative to the camera.
  2. Light scattering after hitting the surface: The way light scatters after interacting with the surface of objects, influencing the appearance of colors and shadows in the image.
  3. Lens optics: The process by which light passes through a lens, affecting image formation due to refraction and other optical phenomena.
  4. Bayer color filter array: A color filter pattern used in most digital cameras to capture colors at each pixel, allowing for the reconstruction of the original colors of the image.

Regarding the image formation process, it’s quite simple geometrically. An object reflects the light that strikes it, and this light is captured by a sensor, forming an image after a certain exposure time. But if it were that simple, given the large number of light rays coming from so many different angles, our sensor wouldn’t be able to focus on anything and would only display a certain luminous blur.

To ensure that each part of the scene strikes only one point of the sensor, it’s possible to introduce an optical barrier with a hole that allows only a portion of the light rays to pass through, reducing blur and providing a sharper image. This hole placed in the barrier is called an aperture or pinhole, and it’s crucial for forming a sharp image, allowing cameras and other image capture devices to function properly.

A photographic camera that does not have a lens is known as a “pinhole” camera, which means “pinhole”.

This principle of physics, known as the camera obscura, serves as the basis for the construction of any photographic camera. An ideal pinhole camera model has an infinitely small hole to obtain an infinitely sharp image.

However, the problem with pinhole cameras is that there is a trade-off between sharpness and brightness. The smaller the hole, the sharper the image. But since the amount of light passing through is smaller, it’s necessary to increase the exposure time.

Moreover, if the hole is of the same order of magnitude as the wavelength of light, we will have the effect of diffraction, which ends up distorting the image. In practice, a hole smaller than 0.3 mm will cause interference in light waves, making the image blurry.

The solution to this problem is the use of lenses. In this case, a thin converging lens will allow the ray passing through the center of the lens not to be deflected and all rays parallel to the optical axis to intersect at a single point (focal point).

The Magic of Lenses in Image Formation

Lenses are essential optical elements in image formation, as they allow more light to be captured by the sensor while still maintaining the sharpness of the image. Lenses work by refracting the light that passes through them, directing the light rays to the correct points on the sensor.

In the context of camera calibration, the thin converging lens is used as a simplified model to describe the relationship between the three-dimensional world and the two-dimensional image captured by the camera’s sensor. This theoretical model is useful for understanding the basic principles of geometric optics and simplifying the calculations involved in camera calibration, and it should satisfy two properties:

  1. Rays passing through the Optical Center are not deflected; and
  2. All rays parallel to the Optical Axis converge at the Focal Point.

As we’ll see in the next article, camera calibration involves determining the intrinsic and extrinsic parameters that describe the relationship between the real-world coordinates and the image coordinates. The intrinsic parameters include the focal length, the principal point, and lens distortion, while the extrinsic parameters describe the position and orientation of the camera relative to the world.

Although the thin lens model is a simplification of the actual optical system of a camera, it can be used as a starting point for calibration.

Focus and Focal Length

Focus is one of the main aspects of image formation with lenses. The focal length, represented by f, is the distance between the center of the lens and the focal point, where light rays parallel to the optical axis converge after passing through the lens.

Thin Lens Equation. Source: Davide Scaramuzza (2022).

The focal length is directly related to the lens’s ability to concentrate light and, consequently, influences the sharpness of the image. The focus equation is given by:

    \[ \frac{1}{f} = \frac{1}{z} + \frac{1}{e} \]

where z is the distance between the object and the lens, and e is the distance between the formed image and the lens. This equation describes the relationship between the focal length, the object distance, and the formed image distance.

Aperture and Depth of Field

Aperture is another essential aspect of image formation with lenses. The aperture, usually represented by an f-number value, controls the amount of light that passes through the lens. A smaller f-number value indicates a larger aperture, allowing more light in and resulting in brighter images.

Aperture also affects the depth of field, which is the range of distance at which objects appear sharp in the image. A larger aperture (smaller f-number value) results in a shallower depth of field, making only objects close to the focal plane appear sharp, while objects farther away or closer become blurred.

This characteristic can be useful for creating artistic effects, such as highlighting a foreground object and blurring the background.

Focal Length and Angle of View

The lens’s focal length also affects the angle of view, which is the extent of the scene captured by the camera. Lenses with a shorter focal length have a wider angle of view, while lenses with a longer focal length have a narrower angle of view. Wide-angle lenses, for example, have short focal lengths and are capable of capturing a broad view of the scene. Telephoto lenses, on the other hand, have long focal lengths and are suitable for capturing distant objects with greater detail.

Focal Length & Angle of View guide.

By selecting the appropriate lens, it is possible to adjust the composition and framing of the image, as well as control the amount of light entering the sensor and the depth of field. Furthermore, the use of lenses allows for manipulation of perspective and capturing subtle details that would be impossible to record with a pinhole model.

In summary, the lens is a crucial component in image formation, allowing photographers and filmmakers to control and shape light effectively and creatively. With proper knowledge about lens characteristics and their implications in image formation, it is possible to explore the full potential of cameras and other image capturing devices, creating truly stunning and expressive images.

Capture and Representation of Digital Images

Digital cameras use an array of photodiodes (CCD or CMOS) to convert photons (light energy) into electrons, differing from analog cameras that use photographic film to record images. This technology allows capturing and storing images in digital format, simplifying the processing and sharing of photos.

Digital images are organized as a matrix of pixels, where each pixel represents the light intensity at a specific point in the image. A common example of a digital image is an 8-bit image, in which each pixel has an intensity value ranging from 0 to 255. This range of values is a result of using 8 bits to represent intensity, which allows a total of 2^8 = 256 distinct values for each pixel.

Digital images are organized as a matrix of pixels, where each pixel represents the light intensity at a specific point in the image. A common example of a digital image is an 8-bit image, in which each pixel has an intensity value ranging from 0 to 255. This range of values is a result of using 8 bits to represent intensity, which allows a total of 2^8 = 256 distinct values for each pixel.

No modelo RGB, atribui-se um valor de intensidade a cada pixel. No caso das imagens coloridas de 8 bits por canal, os valores de intensidade variam de 0 (preto) a 255 (branco) para cada um dos componentes das cores vermelho, verde e azul.

In the figure above, we see an example of how a machine would “see” a Brazilian Air Force aircraft. In this case, each pixel has a vector of values associated with each of the RGB channels.

Digital cameras typically adopt an RGB color detection system, where each color is represented by a specific channel (red, green, and blue). One of the most common methods for capturing these colors is the Bayer pattern, developed by Bryce Bayer in 1976 while working at Kodak. The Bayer pattern consists of an alternating array of RGB filters placed over the pixel array.

It is interesting to note that the number of green filters is twice that of red and blue filters, as the luminance signal is mainly determined by the green values, and the human visual system is much more sensitive to spatial differences in luminance than chrominance. For each pixel, missing color components can be estimated from neighboring values through interpolation – a process known as demosaicing.

Bayer Filter Pattern Scheme, showing the interaction between visible light, color filters, microlenses, and sensor in capturing vibrant and detailed colors in digital cameras.

However, it is important to emphasize that this is just a common example. In practice, a digital image can have more bits and more channels. Besides the RGB color space, there are several other color spaces, such as YUV, which can also be used in the representation and processing of digital images.

For example, during the period I worked at the Space Operations Center, I received monochromatic images with radiometric resolution of 10 bits per pixel and hyperspectral images with hundreds of channels for analysis.

Summary

This article presented the fundamentals of image formation, exploring the challenges of computer vision, the optical process of capture, the relevance of lenses, and the representation of digital images.

In the second article of this series, I will teach you how to implement a practical example in Python to convert the coordinates of a real 3D object to a 2D image, and how to perform camera calibration (one of the most important areas in Computer Vision).

 

References

  1. Szeliski, R. (2020). Computer Vision: Algorithms and Applications. Springer.
  2. Gonzalez, R. C., & Woods, R. E. (2018). Digital Image Processing. Pearson Education.
Share12Share68Send
Next Post

Build a Surveillance System with Computer Vision and Deep Learning

Carlos Melo

Carlos Melo

Computer Vision Engineer with a degree in Aeronautical Sciences from the Air Force Academy (AFA), Master in Aerospace Engineering from the Technological Institute of Aeronautics (ITA), and founder of Sigmoidal.

Related Posts

Blog

What is Sampling and Quantization in Image Processing

by Carlos Melo
June 20, 2025
Como equalizar histograma de imagens com OpenCV e Python
Computer Vision

Histogram Equalization with OpenCV and Python

by Carlos Melo
July 16, 2024
How to Train YOLOv9 on Custom Dataset
Computer Vision

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

by Carlos Melo
February 29, 2024
YOLOv9 para detecção de Objetos
Blog

YOLOv9: A Step-by-Step Tutorial for Object Detection

by Carlos Melo
February 26, 2024
Depth Anything - Estimativa de Profundidade Monocular
Computer Vision

Depth Estimation on Single Camera with Depth Anything

by Carlos Melo
February 23, 2024
Next Post

Build a Surveillance System with Computer Vision and Deep Learning

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Estimativa de Pose Humana com MediaPipe

Real-time Human Pose Estimation using MediaPipe

September 11, 2023
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

April 10, 2023

Build a Surveillance System with Computer Vision and Deep Learning

1
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

1
Point Cloud Processing with Open3D and Python

Point Cloud Processing with Open3D and Python

1

Fundamentals of Image Formation

0

What is Sampling and Quantization in Image Processing

June 20, 2025
Como equalizar histograma de imagens com OpenCV e Python

Histogram Equalization with OpenCV and Python

July 16, 2024
How to Train YOLOv9 on Custom Dataset

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

February 29, 2024
YOLOv9 para detecção de Objetos

YOLOv9: A Step-by-Step Tutorial for Object Detection

February 26, 2024

Seguir

  • Aqui nós 🇺🇸, a placa é sua. Quando você troca o carro,  por exemplo, você mesmo tira a sua placa do carro vendido e instala a parafusa no carro novo.

Por exemplo, hoje eu vi aqui no “Detran” dos Estados Unidos, paguei a trasnferência do title do veículo, e já comprei minha primeira placa. 

Tudo muito fácil e rápido. Foi menos de 1 hora para resolver toda a burocracia! #usa🇺🇸 #usa
  • Como um carro autônomo "enxerga" o mundo ao redor?

Não há olhos nem intuição, apenas sensores e matemática. Cada imagem capturada passa por um processo rigoroso: amostragem espacial, quantização de intensidade e codificação digital. 

Esse é o desafio, representar um objeto 3D do mundo real, em pixels que façam sentido para a Inteligência Artificial.

🚗📷 A visão computacional é a área mais inovadora do mundo!

Comente aqui se você concorda.

#carrosautonomos #inteligenciaartificial #IA #visãocomputacional
  • 👁️🤖Visão Computacional: a área mais inovadora do mundo! Clique no link da bio e se inscreva na PÓS EM VISÃO COMPUTACIONAL E DEEP LEARNING! #machinelearning #datascience #visãocomputacional
  • E aí, Sergião @spacetoday Você tem DADO em casa? 😂😂

A pergunta pode ter ficado sem resposta no dia. Mas afinal, o que são “dados”?

No mundo de Data Science, dados são apenas registros brutos. Números, textos, cliques, sensores, imagens. Sozinhos, eles não dizem nada 

Mas quando aplicamos técnicas de Data Science, esses dados ganham significado. Viram informação.

E quando a informação é bem interpretada, ela se transforma em conhecimento. Conhecimento gera vantagem estratégica 🎲

Hoje, Data Science não é mais opcional. É essencial para qualquer empresa que quer competir de verdade.

#datascience #cientistadedados #machinelearning
  • 🎙️ Corte da minha conversa com o Thiago Nigro, no PrimoCast #224

Falamos sobre por que os dados são considerados o novo petróleo - para mim, dados são o novo bacon!

Expliquei como empresas que dominam a ciência de dados ganham vantagem real no mercado. Não por armazenarem mais dados, mas por saberem o que fazer com eles.

Também conversamos sobre as oportunidades para quem quer entrar na área de tecnologia. Data Science é uma das áreas mais democráticas que existem. Não importa sua idade, formação ou cidade. O que importa é a vontade de aprender.

Se você quiser ver o episódio completo, é só buscar por Primocast 224.

“O que diferencia uma organização de outra não é a capacidade de armazenamento de dados; é a capacidade de seu pessoal extrair conhecimento desses dados.”

#machinelearning #datascience #visãocomputacional #python
  • 📸 Palestra que realizei no palco principal da Campus Party #15, o maior evento de tecnologia da América Latina!

O tema que escolhi foi "Computação Espacial", onde destaquei as inovações no uso de visão computacional para reconstrução 3D e navegação autônoma.

Apresentei técnicas como Structure-from-Motion (SFM), uma técnica capaz de reconstruir cidades inteiras (como Roma) usando apenas fotos publicadas em redes sociais, e ORB-SLAM, usada por drones e robôs para mapeamento em tempo real.

#visãocomputacional #machinelearning #datascience #python
  • ⚠️❗ Não deem ideia para o Haddad! 

A França usou Inteligência Artificial para detectar mais de 20 mil piscinas não declaradas a partir de imagens aéreas.

Com modelos de Deep Learning, o governo identificou quem estava devendo imposto... e arrecadou mais de €10 milhões com isso.

Quer saber como foi feito? Veja no post completo no blog do Sigmoidal: https://sigmoidal.ai/como-a-franca-usou-inteligencia-artificial-para-detectar-20-mil-piscinas/

#datascience #deeplearning #computerVision #IA
  • Como aprender QUALQUER coisa rapidamente?

💡 Comece com projetos reais desde o primeiro dia.
📁 Crie um portfólio enquanto aprende. 
📢 E compartilhe! Poste, escreva, ensine. Mostre o que está fazendo. Documente a jornada, não o resultado.

Dois livros que mudaram meu jogo:
-> Ultra Aprendizado (Scott Young)
-> Uma Vida Intelectual (Sertillanges)

Aprenda em público. Evolua fazendo.

#ultralearning #estudos #carreira
  • Como eu usava VISÃO COMPUTACIONAL no Centro de Operações Espaciais, planejando missões de satélites em situações de desastres naturais.

A visão computacional é uma fronteira fascinante da tecnologia que transforma a forma como entendemos e respondemos a desastres e situações críticas. 

Neste vídeo, eu compartilho um pouco da minha experiência como Engenheiro de Missão de Satélite e especialista em Visão Computacional. 

#VisãoComputacional #DataScience #MachineLearning #Python
  • 🤔 Essa é a MELHOR linguagem de programação, afinal?

Coloque sua opinião nos comentários!

#python #datascience #machinelearning
  • 💘 A história de como conquistei minha esposa... com Python!

Lá em 2011, mandei a real:

“Eu programo em Python.”
O resto é história.
  • Para rotacionar uma matriz 2D em 90°, primeiro inverto a ordem das linhas (reverse). Depois, faço a transposição in-place. Isso troca matrix[i][j] com matrix[j][i], sem criar outra matriz. A complexidade segue sendo O(n²), mas o uso de memória se mantém O(1).

Esse padrão aparece com frequência em entrevistas. Entender bem reverse + transpose te prepara para várias variações em matrizes.

#machinelearning #visaocomputacional #leetcode
  • Na última aula de estrutura de dados, rodei um simulador de labirintos para ensinar como resolver problemas em grids e matrizes.

Mostrei na prática a diferença entre DFS e BFS. Enquanto a DFS usa stacks, a BFS utiliza a estrutura de fila (queue). Cada abordagem tem seu padrão de propagação e uso ideal.

#machinelearning #visaocomputacional #algoritmos
  • 🔴 Live #2 – Matrizes e Grids: Fundamentos e Algoritmos Essenciais

Na segunda aula da série de lives sobre Estruturas de Dados e Algoritmos, o foco será em Matrizes e Grids, estruturas fundamentais em problemas de caminho, busca e representação de dados espaciais.

📌 O que você vai ver:

Fundamentos de matrizes e grids em programação
Algoritmos de busca: DFS e BFS aplicados a grids
Resolução ao vivo de problemas do LeetCode

📅 Terça-feira, 01/07, às 22h no YouTube 
🎥 (link nos Stories)

#algoritmos #estruturasdedados #leetcode #datascience #machinelearning
  • 💡 Quer passar em entrevistas técnicas?
Veja essa estratégia para você estudar estruturas de dados em uma sequência lógica e intuitiva.
⠀
👨‍🏫 NEETCODE.io
⠀
🚀 Marque alguém que também está se preparando!

#EntrevistaTecnica #LeetCode #MachineLearning #Data Science
  • Live #1 – Arrays & Strings: Teoria e Prática para Entrevistas Técnicas

Segunda-feira eu irei começar uma série de lives sobre Estruturas de Dados e Algoritmos. 

No primeiro encontro, falarei sobre um dos tipos de problemas mais cobrados em entrevistas: Arrays e Strings.

Nesta aula, você vai entender a teoria por trás dessas estruturas, aprender os principais padrões de resolução de problemas e aplicar esse conhecimento em exercícios selecionados do LeetCode.

📅 Segunda-feira, 23/06, às 21h no YouTube

🎥 (link nos Stories)

#machinelearning #datascience #cienciadedados #visãocomputacional
Instagram Youtube LinkedIn Twitter
Sigmoidal

O melhor conteúdo técnico de Data Science, com projetos práticos e exemplos do mundo real.

Seguir no Instagram

Categories

  • Aerospace Engineering
  • Blog
  • Carreira
  • Computer Vision
  • Data Science
  • Deep Learning
  • Featured
  • Iniciantes
  • Machine Learning
  • Posts

Navegar por Tags

3d 3d machine learning 3d vision apollo 13 bayer filter camera calibration career cientista de dados clahe computer vision custom dataset data science deep learning depth anything depth estimation detecção de objetos digital image processing histogram histogram equalization image formation job lens lente machine learning machine learning engineering nasa object detection open3d opencv pinhole projeto python quantization redes neurais roboflow rocket salário sampling scikit-learn space tensorflow tutorial visão computacional yolov8 yolov9

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Cursos
  • Pós-Graduação
  • Blog
  • Sobre Mim
  • Contato
  • Português

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.