fbpx
Sigmoidal
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
  • Português
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
Sigmoidal
No Result
View All Result

Depth Estimation on Single Camera with Depth Anything

Carlos Melo by Carlos Melo
February 23, 2024
in Blog, Computer Vision, Featured, Posts
0
63
SHARES
2.1k
VIEWS
Share on LinkedInShare on FacebookShare on Whatsapp

Monocular Depth Estimation is a Computer Vision task that involves predicting the depth information of a scene, that is, the relative distance from the camera of each pixel, given a single RGB image. This challenging task is a key prerequisite for scene understanding for applications such as 3D scene reconstruction, robotics, Spatial Computing (Apple Vision Pro and Quest 3), and autonomous navigation.

 

Depth Anything - Monocular Depth Estimation
Example of a depth map I generated using Depth Anything.

While various approaches have been developed for depth estimation, Depth Anything represents today a significant advancement in the field of monocular depth perception. In this article, we will explore some of the theoretical foundations of monocular depth perception, and we will clone the Depth Anything repository to conduct our own tests in a local development environment.

Monocular Depth Perception

Depth perception is what allows us to interpret the three-dimensional world from two-dimensional images projected on our retinas. This ability evolved as a crucial aspect for survival, enabling humans to navigate the environment, avoid predators, and locate resources.

The human brain accomplishes this feat through a series of interpretations of visual information, where the overlap of the binocular visual field provides a rich perception of depth.

In addition to binocular vision, this perception is enriched by various monocular cues (depth cues), elements in the environment that allow a single observer to infer depth even with one eye closed. Among these cues are occlusion, relative size, cast shadows, and linear perspective.

These same principles and mechanisms of perception find a parallel in Computer Vision, where the essence of estimation also lies in capturing the spatial structure of a scene to accurately represent its three-dimensional aspects.

Depth Anything for Depth Estimation

The Depth Anything model, introduced in the work “Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data”, represents a significant advancement in monocular depth estimation. Based on the DPT (Dense Prediction Transformer) architecture, it was trained on a vast dataset of over 62 million unlabeled images.

YANG, Lihe et al. Depth anything: Unleashing the power of large-scale unlabeled data. 2024.

The success of this approach is attributed to two main strategies.

  1. The use of data augmentation tools to establish a more challenging optimization target.
  2. Use of auxiliary supervision to ensure the inheritance of semantic priors from pre-trained encoders.

The generalization capability of Depth Anything, tested on six public datasets and randomly captured photographs, surpassed some metrics of existing models, such as MiDaS v3.1 and ZoeDepth.

If you want to delve deeper into the materials and methods used in the research, access the original article at this link.

Depth anything: Unleashing the power of large-scale unlabeled data
Depth Anything framework, where a standard pipeline was adopted to increase the model’s power over unlabeled images.

Setting Up the Environment for “Depth Anything”

To start using Depth Anything for monocular depth estimation, it’s necessary to prepare your development environment by following some simple steps. Make sure you have Poetry installed.

To clone the repository and install dependencies, follow the steps described below:

1. Clone the Repository: First, clone the project repository using the command in the terminal:

git clone https://github.com/LiheYoung/Depth-Anything.git

2. Access the Project Directory: Next, access the project directory:

cd Depth-Anything

3. Initialize the Environment with Poetry: If it’s your first time using Poetry on this project, initialize the environment:

poetry init

4. Activate the Virtual Environment: Activate the virtual environment created by Poetry:

poetry shell

5. Install Dependencies: Install the necessary dependencies, including Gradio, PyTorch, torchvision, opencv-python, and huggingface_hub:

poetry add gradio==4.14.0 torch torchvision opencv-python huggingface_hub

6. Run the Application: Run the application using Streamlit with the command:

python app.py

With the Streamlit app running, you can upload your photos directly through the UI. If you have any difficulties installing the dependencies on your computer, you can also test Depth Anything in this official demo.

The app works only for static images. To generate depth maps from videos, execute the command below in your Terminal. As this process is costly in terms of processing, I recommend that you start your tests with short videos, between 3 and 10 seconds.

python run_video.py --encoder vitl --video-path /path/to/your/video.mov --outdir /path/to/save

Takeaways

  • Essence of Monocular Depth Perception: Monocular depth estimation is crucial for understanding the spatial structure of a scene from a single image, enabling applications such as 3D scene reconstruction.
  • Advancements with Depth Anything: Representing a significant leap in monocular depth perception, the Depth Anything model utilizes the DPT architecture and was trained on an extensive dataset, showing excellent generalization capability.
  • Environment Setup: A step-by-step guide to setting up the development environment to use Depth Anything, including installing dependencies and running applications for practical tests.
  • Practical Application: The article provides detailed instructions for testing depth estimation with images and videos, facilitating practical experimentation and visualization of the Depth Anything model’s results.
Share4Share25Send
Previous Post

Point Cloud Processing with Open3D and Python

Next Post

YOLOv9: A Step-by-Step Tutorial for Object Detection

Carlos Melo

Carlos Melo

Computer Vision Engineer with a degree in Aeronautical Sciences from the Air Force Academy (AFA), Master in Aerospace Engineering from the Technological Institute of Aeronautics (ITA), and founder of Sigmoidal.

Related Posts

Como equalizar histograma de imagens com OpenCV e Python
Computer Vision

Histogram Equalization with OpenCV and Python

by Carlos Melo
July 16, 2024
How to Train YOLOv9 on Custom Dataset
Computer Vision

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

by Carlos Melo
February 29, 2024
YOLOv9 para detecção de Objetos
Blog

YOLOv9: A Step-by-Step Tutorial for Object Detection

by Carlos Melo
February 26, 2024
Point Cloud Processing with Open3D and Python
Computer Vision

Point Cloud Processing with Open3D and Python

by Carlos Melo
February 12, 2024
Blog

Apollo 13 Lessons for Job Landing in Machine Learning

by Carlos Melo
January 10, 2024
Next Post
YOLOv9 para detecção de Objetos

YOLOv9: A Step-by-Step Tutorial for Object Detection

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Estimativa de Pose Humana com MediaPipe

Real-time Human Pose Estimation using MediaPipe

September 11, 2023
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

April 10, 2023

Build a Surveillance System with Computer Vision and Deep Learning

1
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

1
Point Cloud Processing with Open3D and Python

Point Cloud Processing with Open3D and Python

1

Fundamentals of Image Formation

0
Como equalizar histograma de imagens com OpenCV e Python

Histogram Equalization with OpenCV and Python

July 16, 2024
How to Train YOLOv9 on Custom Dataset

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

February 29, 2024
YOLOv9 para detecção de Objetos

YOLOv9: A Step-by-Step Tutorial for Object Detection

February 26, 2024
Depth Anything - Estimativa de Profundidade Monocular

Depth Estimation on Single Camera with Depth Anything

February 23, 2024

Seguir

  • Cada passo te aproxima do que realmente importa. Quer continuar avançando?

🔘 [ ] Agora não
🔘 [ ] Seguir em frente 🚀
  • 🇺🇸 Green Card por Habilidade Extraordinária em Data Science e Machine Learning

Após nossa mudança para os EUA, muitas pessoas me perguntaram como consegui o Green Card tão rapidamente. Por isso, decidi compartilhar um pouco dessa jornada.

O EB-1A é um dos vistos mais seletivos para imigração, sendo conhecido como “The Einstein Visa”, já que o próprio Albert Einstein obteve sua residência permanente através desse processo em 1933.

Apesar do apelido ser um exagero moderno, é fato que esse é um dos vistos mais difíceis de conquistar. Seus critérios rigorosos permitem a obtenção do Green Card sem a necessidade de uma oferta de emprego.

Para isso, o aplicante precisa comprovar, por meio de evidências, que está entre os poucos profissionais de sua área que alcançaram e se mantêm no topo, demonstrando um histórico sólido de conquistas e reconhecimento.

O EB-1A valoriza não apenas um único feito, mas uma trajetória consistente de excelência e liderança, destacando o conjunto de realizações ao longo da carreira.

No meu caso específico, após escrever uma petição com mais de 1.300 páginas contendo todas as evidências necessárias, tive minha solicitação aprovada pelo USCIS, órgão responsável pela imigração nos Estados Unidos.

Fui reconhecido como um indivíduo com habilidade extraordinária em Data Science e Machine Learning, capaz de contribuir em áreas de importância nacional, trazendo benefícios substanciais para os EUA.

Para quem sempre me perguntou sobre o processo de imigração e como funciona o EB-1A, espero que esse resumo ajude a esclarecer um pouco mais. Se tiver dúvidas, estou à disposição para compartilhar mais sobre essa experiência! #machinelearning #datascience
  • 🚀Domine a tecnologia que está revolucionando o mundo.

A Pós-Graduação em Visão Computacional & Deep Learning prepara você para atuar nos campos mais avançados da Inteligência Artificial - de carros autônomos a robôs industriais e drones.

🧠 CARGA HORÁRIA: 400h
💻 MODALIDADE: EAD
📅 INÍCIO DAS AULAS: 29 de maio

Garanta sua vaga agora e impulsione sua carreira com uma formação prática, focada no mercado de trabalho.

Matricule-se já!

#deeplearning #machinelearning #visãocomputacional
  • Green Card aprovado! 🥳 Despedida do Brasil e rumo à nova vida nos 🇺🇸 com a família!
  • Haverá sinais… aprovado na petição do visto EB1A, visto reservado para pessoas com habilidades extraordinárias!

Texas, we are coming! 🤠
  • O que EU TENHO EM COMUM COM O TOM CRUISE??

Clama, não tem nenhuma “semana” aberta. Mas como@é quinta-feira (dia de TBT), olha o que eu resgatei!

Diretamente do TÚNEL DO TEMPO: Carlos Melo &Tom Cruise!
  • Bate e Volta DA ITÁLIA PARA A SUÍÇA 🇨🇭🇮🇹

Aproveitei o dia de folga após o Congresso Internacional de Astronáutica (IAC 2024) e fiz uma viagem “bate e volta” para a belíssima cidade de Lugano, Suíça.

Assista ao vlog e escreve nos comentários se essa não é a cidade mais linda que você já viu!

🔗 LINK NOS STORIES
  • Um paraíso de águas transparentes, e que fica no sul da Suíça!🇨🇭 

Conheça o Lago de Lugano, cercado pelos Alpes Suíços. 

#suiça #lugano #switzerland #datascience
  • Sim, você PRECISA de uma PÓS-GRADUAÇÃO em DATA SCIENCE.
  • 🇨🇭Deixei minha bagagem em um locker no aeroporto de Milão, e vim aproveitar esta última semana nos Alpes suíços!
  • Assista à cobertura completa no YT! Link nos stories 🚀
  • Traje espacial feito pela @axiom.space em parceria com a @prada 

Esse traje será usados pelos astronautas na lua.
para acompanhar as novidades do maior evento sobre espaço do mundo, veja os Stories!

#space #nasa #astronaut #rocket
  • INTERNATIONAL ASTRONAUTICAL CONGRESS - 🇮🇹IAC 2024🇮🇹

Veja a cobertura completa do evento nos DESTAQUES do meu perfil.

Esse é o maior evento de ESPAÇO do mundo! Eu e a @bnp.space estamos representando o Brasil nele 🇧🇷

#iac #space #nasa #spacex
  • 🚀 @bnp.space is building the Next Generation of Sustainable Rocket Fuel.

Join us in transforming the Aerospace Sector with technological and sustainable innovations.
  • 🚀👨‍🚀 Machine Learning para Aplicações Espaciais

Participei do maior congresso de Astronáutica do mundo, e trouxe as novidades e oportunidade da área de dados e Machine Learning para você!

#iac #nasa #spacex
  • 🚀👨‍🚀ACOMPANHE NOS STORIES

Congresso Internacional de Astronáutica (IAC 2024), Milão 🇮🇹
Instagram Youtube LinkedIn Twitter
Sigmoidal

O melhor conteúdo técnico de Data Science, com projetos práticos e exemplos do mundo real.

Seguir no Instagram

Categories

  • Aerospace Engineering
  • Blog
  • Carreira
  • Computer Vision
  • Data Science
  • Deep Learning
  • Featured
  • Iniciantes
  • Machine Learning
  • Posts

Navegar por Tags

3d 3d machine learning 3d vision apollo 13 bayer filter camera calibration career cientista de dados clahe computer vision custom dataset Data Clustering data science deep learning depth anything depth estimation detecção de objetos digital image processing histogram histogram equalization image formation job keras lens lente machine learning machine learning engineering nasa object detection open3d opencv pinhole profissão projeto python redes neurais roboflow rocket scikit-learn space tensorflow tutorial visão computacional yolov8 yolov9

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Cursos
  • Pós-Graduação
  • Blog
  • Sobre Mim
  • Contato
  • Português

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.