fbpx
Sigmoidal
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
  • Português
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
Sigmoidal
No Result
View All Result

Building Rome in a Day: 3D Reconstruction with Computer Vision

Carlos Melo by Carlos Melo
September 15, 2023
in Computer Vision, Data Science
0
62
SHARES
2.1k
VIEWS
Share on LinkedInShare on FacebookShare on Whatsapp

Did you know that it’s possible to perform a 3D reconstruction of an entire city using Computer Vision techniques from photos found online? I mean, photos that other people have posted on social media platforms such as Flickr, Instagram, among others.

Experimental results published in the paper “Building Rome in a Day” revealed that it’s possible to reconstruct entire cities, with up to 150,000 images, in less than a day, using a computing cluster with 500 cores. If you want to learn more about this Computer Vision technique, follow my analysis in this post!

3D City Reconstruction with Computer Vision

In 2009, at the International Conference on Computer Vision hosted by IEEE (Institute of Electrical and Electronics Engineers), the article “Building Rome in a Day” was published. In it, an innovative method was introduced that uses computer vision to create high-resolution 3D models of entire cities from 2D images sourced online.

The authors showcased the system’s effectiveness across various cities, including Rome, hence the paper’s title. Using a vast amount of online-sourced images, the proposed method was able to automatically reconstruct detailed 3D models of the cities.

Compared to other 3D modeling methods, the authors’ proposed method has numerous advantages. Firstly, it’s fully automated and can efficiently handle large datasets of high-resolution images, which means it can be used to model entire cities with relatively little human effort. Furthermore, the automated photogrammetry approach is less intrusive than other 3D data collection techniques, such as laser scanning, which can be crucial in sensitive or historical areas.

The system employs a series of parallel and distributed matching and reconstruction algorithms, designed to maximize parallelism at every step of the process and minimize serialization bottlenecks. This approach allows for efficient scalability both in terms of the problem size and available computing resources.

Dense Reconstruction of St. Mark’s Square (Piazza San Marco in Italian).

City-scale 3D reconstruction has been previously explored in computer vision literature and is widely used in platforms such as Google Earth and Microsoft’s Virtual Earth. However, existing large-scale Structure from Motion (SfM) systems operate with data from structured sources, like aerial photos taken by survey aircraft or images captured by moving vehicles. These systems rely on photos taken with calibrated cameras and often utilize other sensors, such as GPS and inertial navigation units, significantly simplifying the involved calculations.

On the other hand, images sourced from the web don’t have these simplifying features. They are captured with various cameras, under fluctuating lighting conditions, and often lack geographical or camera calibration information. This variability makes photo collections harder to work with for SfM purposes but also makes them an extremely rich source of information about the world. These images capture noteworthy aspects worth photographing, including interiors and artifacts, in addition to exteriors.

Given these characteristics, the paper represents a significant advancement in constructing 3D city models and has important implications in certain areas like cartography, virtual tourism, and urban planning.

3D Reconstruction of Notre-Dame Cathedral (Paris) using the SfM method.

Understanding the Methodology for 3D City Reconstruction

There are three stages in 3D city reconstruction, in order: image capture, image matching, and 3D reconstruction. The process begins by capturing a large number of images of the desired city to model. In this case, the authors tested their method on the city of Rome and utilized over 150,000 images from Flickr.

These images are then loaded into software that uses feature matching techniques (image matching algorithms) to find common points in each image and build a network of points of interest. These points are used to calculate the camera’s position and orientation in each image and to create an approximate 3D point cloud of the scene.

Next, the software uses the images and the approximate point cloud to generate a dense 3D mesh of the scene, representing the city’s surface with precise details. To do this, the method proposed by the authors uses a technique called “stereo matching”, which finds matches between images using epipolar geometry and luminance differences between images.

Finally, the authors apply a texturing technique to add color and texture details to the 3D mesh. This is done automatically, using the original images to extract color and texture information and project them onto the 3D mesh. The end result is a highly detailed and realistic 3D model of the city, which can be used for visualization, urban planning, natural disaster simulations, among other purposes.

The process of 3D city reconstruction from images can be divided into several steps, as follows:

  1. Image capture: the first step is to capture images of the city from different angles and positions using handheld cameras or cars equipped with cameras. It is important to capture enough images to cover the entire area of the city you want to reconstruct.
  2. Camera calibration: before processing the images, it is necessary to calibrate the camera to determine its intrinsic and extrinsic parameters. Intrinsic parameters include the camera’s focal length and principal point, while extrinsic parameters include the camera’s position and orientation relative to the scene. Calibration can be done using calibration standards or self-calibration techniques.
  3. Feature matching: then, the images are processed to find matches between points of interest in image pairs. This can be done using feature matching techniques such as SIFT (Scale-Invariant Feature Transform) or SURF (Speeded Up Robust Features), which identify points of interest in each image and find matches between them in image pairs.
  4. Triangulation: after finding the matches, the corresponding 3D points can be estimated using triangulation. Triangulation is a method to find the 3D position of a point from its projection in two or more images. These 3D points are usually imprecise and noisy but form an approximate point cloud of the scene.
  5. Point cloud optimization: the estimated point cloud can be refined using optimization techniques such as energy minimization or graph optimization to improve the 3D position accuracy of each point.
  6. 3D mesh generation: after obtaining a refined 3D point cloud, it is possible to generate a 3D city mesh using triangulation or Delaunay techniques, which connect neighboring 3D points to form a continuous 3D surface.
  7. 3D mesh texturing: finally, the 3D mesh can be textured using the input images to create a visually realistic representation of the reconstructed city.

Various Applications of 3D Reconstruction

3D city reconstruction has revolutionized various areas of work, opening doors to new possibilities and practical applications. From urban planning to the tourism industry and archaeology, this technology has provided significant advancements and innovative solutions. In this section, we will discuss some of the main applications of 3D city reconstruction and how they are transforming various sectors.

Using the skeletal remains found in 1923, it was possible to conduct a 3D reconstruction of a 4,000-year-old Stone Age woman.

3D city reconstruction technology has been widely used in various sectors, as illustrated in the list below.

  • Urban planning and architecture: 3D reconstructions can assist architects and urban planners in designing constructions according to area characteristics and existing structures. For instance, municipal authorities can use this technology to visualize how new constructions will affect the city’s look and traffic, as well as to identify flood or natural disaster risk areas.
  • Geospatial data analysis: three-dimensional models can be integrated with other data sources, such as maps and satellite data, to provide insights into traffic, land use, and population density.
  • Archaeology and cultural heritage: 3D city reconstruction can be used for historical and cultural preservation purposes, making it possible to create accurate three-dimensional models of ancient buildings or archaeological sites. This allows them to be studied and virtually preserved, even if they have been demolished or damaged.
  • Simulation and modeling: 3D models can be used to simulate and model urban scenarios and help researchers understand how cities evolve over time.
  • Machine learning and computer vision: 3D models can be used to train machine learning and computer vision algorithms to recognize objects and patterns in urban environments.
  • Tourism: this technology can be used in the tourism industry to create immersive and educational experiences. For example, a museum can create a virtual tour of a historic city, allowing visitors to see what the city looked like in the past. Similarly, tourism agencies can create interactive tour guides that assist visitors in exploring the city and its attractions.
  • Games: 3D city reconstruction is often used in the gaming industry to create realistic environments. For instance, a game might use a 3D reconstruction of an existing city as the basis for creating a virtual world for players to explore. Similarly, a historical game might use a 3D reconstruction of an ancient city to create an authentic environment.
  • Security: Used in emergency planning and disaster response, allowing rescue teams to view affected sites and plan rescue operations more efficiently. Three-dimensional models can be used to predict how disasters would affect different city areas and what the safest escape routes would be. Likewise, it can enhance public safety, allowing authorities to monitor and analyze traffic patterns, pedestrian flow, and criminal activities in urban areas.
  • Public Budget: Planning the use of public resources, such as public lighting, traffic monitoring, and waste management.

In short, 3D city reconstruction technology has a profound impact on various areas, enabling more detailed analyses, efficient planning, and immersive experiences. As technology advances and becomes more accessible, the use of 3D city reconstruction is expected to further expand, covering new sectors and applications. Without a doubt, 3D city reconstruction is reshaping the world around us, creating new opportunities and paving the way for future innovations.

Challenges and Limitations

Obviously, every technology has its limitations and challenges to overcome. The limitations of technology used in 3D city reconstruction are closely linked to the images, both numerically and qualitatively.

To model an entire city in detail, it’s necessary to capture a large number of images from different angles and positions, which can be a logistical and cost challenge. There is also a dependency on high-quality photos (which is logistically and economically challenging in remote areas), taken under good lighting and weather conditions to achieve good results. Moreover, the quality of the final model can be affected by camera calibration, which, if incorrect, can lead to errors in determining the camera’s position and orientation in each image, directly affecting the accuracy of the final modeling.

Privacy is a major concern when collecting aerial or ground images of urban areas. Because images may contain personal information, such as car license plates, faces, and even the internal layout of buildings, this could compromise citizens’ security and privacy. Therefore, it’s important for companies and institutions involved in creating 3D city reconstructions to establish clear guidelines for ethical data use and protect people’s privacy.

Perspectives and Opportunities

3D city reconstruction is a rapidly evolving area with a promising future, packed with opportunities for technological advances and new applications. Some of the main perspectives and opportunities are listed below.

  1. Increased Resolution and Accuracy: With advancements in imaging technology, cameras and sensors are expected to become more powerful, allowing for the collection of high-resolution and more accurate data. This will allow for even more detailed and accurate 3D reconstructions.
  2. Use of Drones and Robots: Using drones and robots for data collection can make the 3D reconstruction process more efficient and accurate, especially in hard-to-reach or dangerous areas.
  3. Machine Learning and Artificial Intelligence: Using machine learning and artificial intelligence techniques can help automate the 3D reconstruction process, allowing for the collection and processing of vast amounts of data more quickly and efficiently.
  4. Integration with Other Technologies: 3D city reconstruction can be integrated with other technologies, such as augmented and virtual reality, allowing users to interact with cities in an even more immersive and engaging manner.
  5. Applications in Emerging Areas: 3D city reconstruction has great potential for use in emerging areas like urban planning, environmental monitoring, natural disaster prevention, and risk assessment.

Conclusion

3D reconstruction and computer vision are becoming increasingly valuable tools in analyzing and preserving the past. As technology continues to progress, we are likely to see even more applications and innovations in this field, helping to unravel the mysteries of ancient civilizations and protect our historical heritage for future generations.

And for those who wish to delve deeper into 3D reconstruction, I definitely recommend reading my complete tutorial on ORB-SLAM, where I walk you through the step-by-step process of transforming images and videos into 3D maps.

Share4Share25Send
Previous Post

Learn Camera Calibration using OpenCV

Next Post

k-Nearest Neighbors (k-NN) for Classifying RR Lyrae Stars

Carlos Melo

Carlos Melo

Computer Vision Engineer with a degree in Aeronautical Sciences from the Air Force Academy (AFA), Master in Aerospace Engineering from the Technological Institute of Aeronautics (ITA), and founder of Sigmoidal.

Related Posts

Blog

What is Sampling and Quantization in Image Processing

by Carlos Melo
June 20, 2025
Como equalizar histograma de imagens com OpenCV e Python
Computer Vision

Histogram Equalization with OpenCV and Python

by Carlos Melo
July 16, 2024
How to Train YOLOv9 on Custom Dataset
Computer Vision

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

by Carlos Melo
February 29, 2024
YOLOv9 para detecção de Objetos
Blog

YOLOv9: A Step-by-Step Tutorial for Object Detection

by Carlos Melo
February 26, 2024
Depth Anything - Estimativa de Profundidade Monocular
Computer Vision

Depth Estimation on Single Camera with Depth Anything

by Carlos Melo
February 23, 2024
Next Post
k-Nearest Neighbors (k-NN) for Classifying RR Lyrae Stars

k-Nearest Neighbors (k-NN) for Classifying RR Lyrae Stars

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Estimativa de Pose Humana com MediaPipe

Real-time Human Pose Estimation using MediaPipe

September 11, 2023
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

April 10, 2023

Build a Surveillance System with Computer Vision and Deep Learning

1
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

1
Point Cloud Processing with Open3D and Python

Point Cloud Processing with Open3D and Python

1

Fundamentals of Image Formation

0

What is Sampling and Quantization in Image Processing

June 20, 2025
Como equalizar histograma de imagens com OpenCV e Python

Histogram Equalization with OpenCV and Python

July 16, 2024
How to Train YOLOv9 on Custom Dataset

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

February 29, 2024
YOLOv9 para detecção de Objetos

YOLOv9: A Step-by-Step Tutorial for Object Detection

February 26, 2024

Seguir

  • Aqui nós 🇺🇸, a placa é sua. Quando você troca o carro,  por exemplo, você mesmo tira a sua placa do carro vendido e instala a parafusa no carro novo.

Por exemplo, hoje eu vi aqui no “Detran” dos Estados Unidos, paguei a trasnferência do title do veículo, e já comprei minha primeira placa. 

Tudo muito fácil e rápido. Foi menos de 1 hora para resolver toda a burocracia! #usa🇺🇸 #usa
  • Como um carro autônomo "enxerga" o mundo ao redor?

Não há olhos nem intuição, apenas sensores e matemática. Cada imagem capturada passa por um processo rigoroso: amostragem espacial, quantização de intensidade e codificação digital. 

Esse é o desafio, representar um objeto 3D do mundo real, em pixels que façam sentido para a Inteligência Artificial.

🚗📷 A visão computacional é a área mais inovadora do mundo!

Comente aqui se você concorda.

#carrosautonomos #inteligenciaartificial #IA #visãocomputacional
  • 👁️🤖Visão Computacional: a área mais inovadora do mundo! Clique no link da bio e se inscreva na PÓS EM VISÃO COMPUTACIONAL E DEEP LEARNING! #machinelearning #datascience #visãocomputacional
  • E aí, Sergião @spacetoday Você tem DADO em casa? 😂😂

A pergunta pode ter ficado sem resposta no dia. Mas afinal, o que são “dados”?

No mundo de Data Science, dados são apenas registros brutos. Números, textos, cliques, sensores, imagens. Sozinhos, eles não dizem nada 

Mas quando aplicamos técnicas de Data Science, esses dados ganham significado. Viram informação.

E quando a informação é bem interpretada, ela se transforma em conhecimento. Conhecimento gera vantagem estratégica 🎲

Hoje, Data Science não é mais opcional. É essencial para qualquer empresa que quer competir de verdade.

#datascience #cientistadedados #machinelearning
  • 🎙️ Corte da minha conversa com o Thiago Nigro, no PrimoCast #224

Falamos sobre por que os dados são considerados o novo petróleo - para mim, dados são o novo bacon!

Expliquei como empresas que dominam a ciência de dados ganham vantagem real no mercado. Não por armazenarem mais dados, mas por saberem o que fazer com eles.

Também conversamos sobre as oportunidades para quem quer entrar na área de tecnologia. Data Science é uma das áreas mais democráticas que existem. Não importa sua idade, formação ou cidade. O que importa é a vontade de aprender.

Se você quiser ver o episódio completo, é só buscar por Primocast 224.

“O que diferencia uma organização de outra não é a capacidade de armazenamento de dados; é a capacidade de seu pessoal extrair conhecimento desses dados.”

#machinelearning #datascience #visãocomputacional #python
  • 📸 Palestra que realizei no palco principal da Campus Party #15, o maior evento de tecnologia da América Latina!

O tema que escolhi foi "Computação Espacial", onde destaquei as inovações no uso de visão computacional para reconstrução 3D e navegação autônoma.

Apresentei técnicas como Structure-from-Motion (SFM), uma técnica capaz de reconstruir cidades inteiras (como Roma) usando apenas fotos publicadas em redes sociais, e ORB-SLAM, usada por drones e robôs para mapeamento em tempo real.

#visãocomputacional #machinelearning #datascience #python
  • ⚠️❗ Não deem ideia para o Haddad! 

A França usou Inteligência Artificial para detectar mais de 20 mil piscinas não declaradas a partir de imagens aéreas.

Com modelos de Deep Learning, o governo identificou quem estava devendo imposto... e arrecadou mais de €10 milhões com isso.

Quer saber como foi feito? Veja no post completo no blog do Sigmoidal: https://sigmoidal.ai/como-a-franca-usou-inteligencia-artificial-para-detectar-20-mil-piscinas/

#datascience #deeplearning #computerVision #IA
  • Como aprender QUALQUER coisa rapidamente?

💡 Comece com projetos reais desde o primeiro dia.
📁 Crie um portfólio enquanto aprende. 
📢 E compartilhe! Poste, escreva, ensine. Mostre o que está fazendo. Documente a jornada, não o resultado.

Dois livros que mudaram meu jogo:
-> Ultra Aprendizado (Scott Young)
-> Uma Vida Intelectual (Sertillanges)

Aprenda em público. Evolua fazendo.

#ultralearning #estudos #carreira
  • Como eu usava VISÃO COMPUTACIONAL no Centro de Operações Espaciais, planejando missões de satélites em situações de desastres naturais.

A visão computacional é uma fronteira fascinante da tecnologia que transforma a forma como entendemos e respondemos a desastres e situações críticas. 

Neste vídeo, eu compartilho um pouco da minha experiência como Engenheiro de Missão de Satélite e especialista em Visão Computacional. 

#VisãoComputacional #DataScience #MachineLearning #Python
  • 🤔 Essa é a MELHOR linguagem de programação, afinal?

Coloque sua opinião nos comentários!

#python #datascience #machinelearning
  • 💘 A história de como conquistei minha esposa... com Python!

Lá em 2011, mandei a real:

“Eu programo em Python.”
O resto é história.
  • Para rotacionar uma matriz 2D em 90°, primeiro inverto a ordem das linhas (reverse). Depois, faço a transposição in-place. Isso troca matrix[i][j] com matrix[j][i], sem criar outra matriz. A complexidade segue sendo O(n²), mas o uso de memória se mantém O(1).

Esse padrão aparece com frequência em entrevistas. Entender bem reverse + transpose te prepara para várias variações em matrizes.

#machinelearning #visaocomputacional #leetcode
  • Na última aula de estrutura de dados, rodei um simulador de labirintos para ensinar como resolver problemas em grids e matrizes.

Mostrei na prática a diferença entre DFS e BFS. Enquanto a DFS usa stacks, a BFS utiliza a estrutura de fila (queue). Cada abordagem tem seu padrão de propagação e uso ideal.

#machinelearning #visaocomputacional #algoritmos
  • 🔴 Live #2 – Matrizes e Grids: Fundamentos e Algoritmos Essenciais

Na segunda aula da série de lives sobre Estruturas de Dados e Algoritmos, o foco será em Matrizes e Grids, estruturas fundamentais em problemas de caminho, busca e representação de dados espaciais.

📌 O que você vai ver:

Fundamentos de matrizes e grids em programação
Algoritmos de busca: DFS e BFS aplicados a grids
Resolução ao vivo de problemas do LeetCode

📅 Terça-feira, 01/07, às 22h no YouTube 
🎥 (link nos Stories)

#algoritmos #estruturasdedados #leetcode #datascience #machinelearning
  • 💡 Quer passar em entrevistas técnicas?
Veja essa estratégia para você estudar estruturas de dados em uma sequência lógica e intuitiva.
⠀
👨‍🏫 NEETCODE.io
⠀
🚀 Marque alguém que também está se preparando!

#EntrevistaTecnica #LeetCode #MachineLearning #Data Science
  • Live #1 – Arrays & Strings: Teoria e Prática para Entrevistas Técnicas

Segunda-feira eu irei começar uma série de lives sobre Estruturas de Dados e Algoritmos. 

No primeiro encontro, falarei sobre um dos tipos de problemas mais cobrados em entrevistas: Arrays e Strings.

Nesta aula, você vai entender a teoria por trás dessas estruturas, aprender os principais padrões de resolução de problemas e aplicar esse conhecimento em exercícios selecionados do LeetCode.

📅 Segunda-feira, 23/06, às 21h no YouTube

🎥 (link nos Stories)

#machinelearning #datascience #cienciadedados #visãocomputacional
Instagram Youtube LinkedIn Twitter
Sigmoidal

O melhor conteúdo técnico de Data Science, com projetos práticos e exemplos do mundo real.

Seguir no Instagram

Categories

  • Aerospace Engineering
  • Blog
  • Carreira
  • Computer Vision
  • Data Science
  • Deep Learning
  • Featured
  • Iniciantes
  • Machine Learning
  • Posts

Navegar por Tags

3d 3d machine learning 3d vision apollo 13 bayer filter camera calibration career cientista de dados clahe computer vision custom dataset data science deep learning depth anything depth estimation detecção de objetos digital image processing histogram histogram equalization image formation job lens lente machine learning machine learning engineering nasa object detection open3d opencv pinhole projeto python quantization redes neurais roboflow rocket salário sampling scikit-learn space tensorflow tutorial visão computacional yolov8 yolov9

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Cursos
  • Pós-Graduação
  • Blog
  • Sobre Mim
  • Contato
  • Português

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.