fbpx
Sigmoidal
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
  • Português
  • Home
  • LinkedIn
  • About me
  • Contact
No Result
View All Result
Sigmoidal
No Result
View All Result

Real-time Human Pose Estimation using MediaPipe

In this tutorial, you will get to know the MediaPipe library and develop a Python code capable of estimating human poses from images.

Carlos Melo by Carlos Melo
September 11, 2023
in Computer Vision
0
454
SHARES
15.1k
VIEWS
Share on LinkedInShare on FacebookShare on Whatsapp

Have you ever noticed the precision and rhythmic control in Tiger Woods’ swing in golf? Or the meticulous preparation of Roberto Carlos, adjusting the ball before firing a left-footed cannon in soccer?

Training for any sport requires discipline, dedication, and a lot of repetition. It’s not just about learning the rules or techniques, but refining skills to execute perfect movements.

My old coach at the Air Force Academy used to say, “only repetition, to exhaustion, leads to perfection”.

But to correct a wrong move, I first need to know where I’m going wrong. That’s where high-performance coaches come in, using videos to fine-tune their athletes’ performances.

Example of Pose Estimation during Theo’s training.

Inspired by Theo’s judo training, and as a Computer Vision Engineer, I thought: why not create a visual application for high-performance athlete pose estimation?

Welcome to the world of human pose estimation! In this tutorial, you will get to know the MediaPipe library and learn how to detect body joints in images.

Click here to download the source code to this post

What is Pose Estimation?

Imagine teaching a computer to recognize (and understand) human poses, just as a human would. With Human Pose Estimation (HPE), we can achieve just that.

Leveraging the power of machine learning (ML) and computer vision, we can accurately estimate the positions of body joints, such as shoulders, elbows, wrists, hips, knees, and ankles.

In the sports realm, this technology is very relevant in overseeing and improving posture during physical activities, helping both in injury prevention and in enhancing athletic performance.

But that’s just one application possibility. Here are some others, directly sourced from Google Developers’ website.

Applications of Pose Estimation

In terms of physiotherapy, for example, accurate identification of movements and postures plays a crucial role, facilitating the monitoring of patients’ progress and providing instant feedback during the rehabilitation process.

In the entertainment sector, augmented and virtual reality applications benefit greatly from this technology, providing a more natural and intuitive user interaction. Just look at the hype created by the Pokemon Go game, or more recently, by Apple Vision Pro.

Pose estimation techniques also have promising applications in behavioral analysis, surveillance systems, gesture recognition, and human-computer interaction, creating a field of possibilities yet to be explored.

What is MediaPipe?

MediaPipe is an open-source platform maintained by Google, offering a comprehensive set of tools, APIs, and pre-trained models that make building applications for tasks like pose estimation, object detection, facial recognition, and more, easier than ever.

Being cross-platform, you can build pipelines on desktops, servers, iOS, Android, and embed them on devices like Raspberry Pi and Jetson Nano.

MediaPipe Framework

The MediaPipe Framework is a low-level component used for constructing efficient Machine Learning (ML) pipelines on devices. It has foundational concepts such as packages, graphs, and calculators that pass, direct, and process data respectively. Written in C++, Java, and Obj-C, the framework consists of the following APIs:

  1. Calculator API (C++)
  2. Graph construction API (Protobuf)
  3. Graph Execution API (C++, Java, Obj-C)

MediaPipe Solutions

MediaPipe Solutions, on the other hand, is a set of ready-made solutions, open-source code examples based on a TensorFlow model.

These solutions are perfect for developers who want to quickly add ML capabilities to their apps without having to build a pipeline from scratch. Ready-made solutions are available in C++, Python, Javascript, Android, iOS, and Coral. Check out the gallery below for the available solutions.

 

1 of 9
- +

1. Detecção de Objetos

2. Detecção de Rostos

3. Classificação de Imagens

4. Marcação de Mãos

5. Detecção de Marcadores Faciais

6. Segmentação de Imagens

7. Incorporação de Imagens

8. Segmentação Interativa

9. Detecção de Marcadores de Pose

MediaPipe Practical Project

The MediaPipe Pose models can be divided into three main categories: skeleton-based, contour-based, and volume-based.

  1. Skeleton-based Model
  2. Contour-based Model
  3. Volume-based Model

MediaPipe Pose adopts the skeleton-based approach, using the topology of 33 landmarks, known as landmarks, derived from BlazePose.

In our context, MediaPipe Pose adopts the skeleton-based approach, utilizing the topology of 33 markers, known as landmarks, derived from BlazePose.

It is also worth noting that, unlike models like YOLOv8 or YOLO-NAS which were structured for detecting multiple people, MediaPipe Pose is a framework aimed at pose estimation for a single person.

Prerequisites

Now, let’s learn in practice how to estimate poses in images using a ready-made MediaPipe model. All materials and code used in this tutorial can be downloaded for free.

The libraries that will be used are OpenCV and MediaPipe. Use the requirements.txt file (provided in the code folder) to proceed.

pip install -r requirements.txt

After executing the code block above, all the necessary dependencies for the project will be conveniently installed.

# required libraries
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt

First, we import the required libraries. The cv2 library is used for image processing and manipulation, while mediapipe is the Python package that provides the MediaPipe framework for pose estimation. Additionally, we import matplotlib.pyplot to visualize the results later on.

Now that we have imported the required libraries, we can move on to the next steps in our pose estimation pipeline. Python solutions are very straightforward. For the Pose Estimation task, we will follow these steps:

  1. Detect and draw pose landmarks
  2. Draw landmark connections
  3. Get the coordinate pixel of the landmark

Step 1: Detect and draw pose landmarks

In this step, we will identify the key body points, or landmarks, and draw them on our images and videos for a more intuitive visualization.

# Loading the image using OpenCV.
img = cv2.imread("roberto-carlos-the-best-in-the-world.jpg")

# Getting the image's width and height.
img_width = img.shape[1]
img_height = img.shape[0]

# Creating a figure and a set of axes.
fig, ax = plt.subplots(figsize=(10, 10))
ax.axis('off')
ax.imshow(img[...,::-1])
plt.show()

First, the code loads an image using OpenCV’s imread function. The image file, named “roberto-carlos-the-best-in-the-world.jpg” which is located in the file folder, is read and stored in the img variable.

Next, the code retrieves the image’s width and height using the shape attribute of the img array. The width is assigned to the img_width variable and the height to the img_height variable.

After that, a figure and a set of axes are created using plt.subplots. The figsize parameter sets the figure size to 10×10 inches. The line ax.axis('off') removes axis labels and ticks from the graph. Finally, the image is displayed on the axes using ax.imshow, and plt.show() is called to render the graph.

# Initializing the Pose and Drawing modules of MediaPipe.
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils

The code initializes two modules from the MediaPipe framework: mp_pose and mp_drawing.

The mp_pose module provides functionality for pose estimation. It contains pretrained models and algorithms that can detect and track human body poses in images or videos. This module is crucial for performing pose estimation tasks using MediaPipe.

The mp_drawing module, on the other hand, provides utilities for drawing the detected poses on images or videos. It offers functions to overlay pose landmarks and connections on visual media, making it easier to view and interpret the pose estimation results.

with mp_pose.Pose(static_image_mode=True) as pose:
    """
    This function utilizes the MediaPipe library to detect and draw 'landmarks'
    (reference points) on an image. 'Landmarks' are points of interest
    that represent various body parts detected in the image.

    Args:
        static_image_mode: a boolean to inform if the image is static (True) or sequential (False).
    """

    # Make a copy of the original image.
    annotated_img = img.copy()

    # Processes the image.
    results = pose.process(img)

    # Set the circle radius for drawing the 'landmarks'.
    # The radius is scaled as a percentage of the image's height.
    circle_radius = int(.007 * img_height)

    # Specifies the drawing style for the 'landmarks'.
    point_spec = mp_drawing.DrawingSpec(color=(220, 100, 0), thickness=-1, circle_radius=circle_radius)

    # Draws the 'landmarks' on the image.
    mp_drawing.draw_landmarks(annotated_img,
                              landmark_list=results.pose_landmarks,
                              landmark_drawing_spec=point_spec)

Let’s break down this code block and understand how it functions.

The code block begins with a with statement that initializes the mp_pose.Pose class from MediaPipe. This class is responsible for detecting and drawing landmarks (key points) on an image. The static_image_mode argument is set to True to indicate that the image being processed is static.

Next, a copy of the original image is made using img.copy(). This ensures that the original image remains unmodified during the annotation process.

The pose.process() function is then called to process the image and obtain the pose estimation results. The results contain the detected pose landmarks.

The code sets the circle’s radius used to draw the landmarks based on a percentage of the image’s height. This ensures the circle size is proportionate to the image size.

A point_spec object is created using mp_drawing.DrawingSpec to specify the drawing style of the landmarks. It defines the color, thickness, and circle’s radius of the drawn landmarks.

Lastly, mp_drawing.draw_landmarks() is invoked to draw the landmarks on annotated_img using the specified style and the landmarks obtained from the results.

This code block showcases how to use MediaPipe to detect and draw landmarks on an image. It provides a visual representation of the detected pose, enabling us to analyze and interpret the pose estimation results.

Step 2: Draw landmarks connection

In addition to detecting individual landmarks, MediaPipe also allows drawing connections between them, enhancing the understanding of the overall posture.

# Make a copy of the original image.
annotated_img = img.copy()

# Specifies the drawing style for landmark connections.
line_spec = mp_drawing.DrawingSpec(color=(0, 255, 0), thickness=2)

# Draws both the landmark points and connections.
mp_drawing.draw_landmarks(
annotated_img,
landmark_list=results.pose_landmarks,
connections=mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=point_spec,
connection_drawing_spec=line_spec
)

First, a copy of the original image is created. Then, the style for drawing connections between landmarks is specified using the mp_drawing.DrawingSpec class, with the color green and a thickness of 2. Afterwards, the function mp_drawing.draw_landmarks is called to draw both the landmark points and the connections on the annotated image. It receives the annotated image, the pose landmark list from the results object, the predefined connections of mp_pose.POSE_CONNECTIONS, and the drawing specifications for the landmarks and connections.

Step 3: Get the pixel coordinate of the landmark

Now we can extract the pixel coordinates corresponding to each landmark, allowing for more detailed analysis. These pixel coordinates, when used in conjunction with the connections between landmarks, can be extremely helpful in understanding the position and orientation of various body parts in an image. Moreover, these coordinates can be used to calculate more complex metrics such as the ratio between different body parts, which can be helpful in many applications, such as biomechanical analysis, virtual avatar creation, animation, and more.

# Select the coordinates of the points of interest.
l_knee_x = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_KNEE].x * img_width)
l_knee_y = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_KNEE].y * img_height)

l_ankle_x = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_ANKLE].x * img_width)
l_ankle_y = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_ANKLE].y * img_height)

l_heel_x = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_HEEL].x * img_width)
l_heel_y = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_HEEL].y * img_height)

l_foot_index_x = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_FOOT_INDEX].x * img_width)
l_foot_index_y = int(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_FOOT_INDEX].y * img_height)

# Print the coordinates on the screen.
print('Left knee coordinates: (', l_knee_x,',',l_knee_y,')' )
print('Left ankle coordinates: (', l_ankle_x,',',l_ankle_y,')' )
print('Left heel coordinates: (', l_heel_x,',',l_heel_y,')' )
print('Left foot index coordinates: (', l_foot_index_x,',',l_foot_index_y,')' )

The above code block highlights the extraction and printing of specific landmark coordinates from the pose estimation results.

Firstly, the code calculates the x and y coordinates of four points of interest (left knee, left ankle, left heel, and left foot index). It should be noted that each landmark has a specific number according to the BlazePose model.

Left knee coordinates: ( 554 , 747 )
Left ankle coordinates: ( 661 , 980 )
Left heel coordinates: ( 671 , 1011 )
Left foot index coordinates: ( 657 , 1054 )

Next, we multiply the normalized positions of the landmarks by the width and height of the image. These coordinates are then printed on the screen using print statements.

# Displaying a graph with the selected points.
fig, ax = plt.subplots()
ax.imshow(img[:, :, ::-1])
ax.plot([l_knee_x, l_ankle_x, l_heel_x, l_foot_index_x], [l_knee_y, l_ankle_y, l_heel_y, l_foot_index_y], 'ro')
plt.show()

Lastly, the four points of interest are plotted on the graph as red points using ax.plot and the graph is displayed using plt.show().

Conclusion

In this tutorial, we explored the concept of pose estimation with MediaPipe. We learned about MediaPipe, a powerful framework for building multimodal perceptual pipelines, and how it can be used for human pose estimation.
We covered the basic concepts of pose estimation in images and discussed how to interpret the output. Specifically, we saw that:

  • MediaPipe is a versatile framework for building perceptual pipelines.
  • Pose estimation allows us to track and analyze a person’s pose in an image or video.
  • MediaPipe’s pose estimation models provide coordinates for various body parts, enabling applications in fields like fitness tracking, augmented reality, and more.

In a future tutorial, I’ll teach you how to apply these concepts to videos to calculate trajectories and angles among certain body parts.

Share32Share182Send
Previous Post

What is Computer Vision and How does it work?

Next Post

Learn Camera Calibration using OpenCV

Carlos Melo

Carlos Melo

Computer Vision Engineer with a degree in Aeronautical Sciences from the Air Force Academy (AFA), Master in Aerospace Engineering from the Technological Institute of Aeronautics (ITA), and founder of Sigmoidal.

Related Posts

Como equalizar histograma de imagens com OpenCV e Python
Computer Vision

Histogram Equalization with OpenCV and Python

by Carlos Melo
July 16, 2024
How to Train YOLOv9 on Custom Dataset
Computer Vision

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

by Carlos Melo
February 29, 2024
YOLOv9 para detecção de Objetos
Blog

YOLOv9: A Step-by-Step Tutorial for Object Detection

by Carlos Melo
February 26, 2024
Depth Anything - Estimativa de Profundidade Monocular
Computer Vision

Depth Estimation on Single Camera with Depth Anything

by Carlos Melo
February 23, 2024
Point Cloud Processing with Open3D and Python
Computer Vision

Point Cloud Processing with Open3D and Python

by Carlos Melo
February 12, 2024
Next Post

Learn Camera Calibration using OpenCV

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
Estimativa de Pose Humana com MediaPipe

Real-time Human Pose Estimation using MediaPipe

September 11, 2023
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

April 10, 2023

Build a Surveillance System with Computer Vision and Deep Learning

1
ORB-SLAM 3: A Tool for 3D Mapping and Localization

ORB-SLAM 3: A Tool for 3D Mapping and Localization

1
Point Cloud Processing with Open3D and Python

Point Cloud Processing with Open3D and Python

1

Fundamentals of Image Formation

0
Como equalizar histograma de imagens com OpenCV e Python

Histogram Equalization with OpenCV and Python

July 16, 2024
How to Train YOLOv9 on Custom Dataset

How to Train YOLOv9 on Custom Dataset – A Complete Tutorial

February 29, 2024
YOLOv9 para detecção de Objetos

YOLOv9: A Step-by-Step Tutorial for Object Detection

February 26, 2024
Depth Anything - Estimativa de Profundidade Monocular

Depth Estimation on Single Camera with Depth Anything

February 23, 2024

Seguir

  • Cada passo te aproxima do que realmente importa. Quer continuar avançando?

🔘 [ ] Agora não
🔘 [ ] Seguir em frente 🚀
  • 🇺🇸 Green Card por Habilidade Extraordinária em Data Science e Machine Learning

Após nossa mudança para os EUA, muitas pessoas me perguntaram como consegui o Green Card tão rapidamente. Por isso, decidi compartilhar um pouco dessa jornada.

O EB-1A é um dos vistos mais seletivos para imigração, sendo conhecido como “The Einstein Visa”, já que o próprio Albert Einstein obteve sua residência permanente através desse processo em 1933.

Apesar do apelido ser um exagero moderno, é fato que esse é um dos vistos mais difíceis de conquistar. Seus critérios rigorosos permitem a obtenção do Green Card sem a necessidade de uma oferta de emprego.

Para isso, o aplicante precisa comprovar, por meio de evidências, que está entre os poucos profissionais de sua área que alcançaram e se mantêm no topo, demonstrando um histórico sólido de conquistas e reconhecimento.

O EB-1A valoriza não apenas um único feito, mas uma trajetória consistente de excelência e liderança, destacando o conjunto de realizações ao longo da carreira.

No meu caso específico, após escrever uma petição com mais de 1.300 páginas contendo todas as evidências necessárias, tive minha solicitação aprovada pelo USCIS, órgão responsável pela imigração nos Estados Unidos.

Fui reconhecido como um indivíduo com habilidade extraordinária em Data Science e Machine Learning, capaz de contribuir em áreas de importância nacional, trazendo benefícios substanciais para os EUA.

Para quem sempre me perguntou sobre o processo de imigração e como funciona o EB-1A, espero que esse resumo ajude a esclarecer um pouco mais. Se tiver dúvidas, estou à disposição para compartilhar mais sobre essa experiência! #machinelearning #datascience
  • 🚀Domine a tecnologia que está revolucionando o mundo.

A Pós-Graduação em Visão Computacional & Deep Learning prepara você para atuar nos campos mais avançados da Inteligência Artificial - de carros autônomos a robôs industriais e drones.

🧠 CARGA HORÁRIA: 400h
💻 MODALIDADE: EAD
📅 INÍCIO DAS AULAS: 29 de maio

Garanta sua vaga agora e impulsione sua carreira com uma formação prática, focada no mercado de trabalho.

Matricule-se já!

#deeplearning #machinelearning #visãocomputacional
  • Green Card aprovado! 🥳 Despedida do Brasil e rumo à nova vida nos 🇺🇸 com a família!
  • Haverá sinais… aprovado na petição do visto EB1A, visto reservado para pessoas com habilidades extraordinárias!

Texas, we are coming! 🤠
  • O que EU TENHO EM COMUM COM O TOM CRUISE??

Clama, não tem nenhuma “semana” aberta. Mas como@é quinta-feira (dia de TBT), olha o que eu resgatei!

Diretamente do TÚNEL DO TEMPO: Carlos Melo &Tom Cruise!
  • Bate e Volta DA ITÁLIA PARA A SUÍÇA 🇨🇭🇮🇹

Aproveitei o dia de folga após o Congresso Internacional de Astronáutica (IAC 2024) e fiz uma viagem “bate e volta” para a belíssima cidade de Lugano, Suíça.

Assista ao vlog e escreve nos comentários se essa não é a cidade mais linda que você já viu!

🔗 LINK NOS STORIES
  • Um paraíso de águas transparentes, e que fica no sul da Suíça!🇨🇭 

Conheça o Lago de Lugano, cercado pelos Alpes Suíços. 

#suiça #lugano #switzerland #datascience
  • Sim, você PRECISA de uma PÓS-GRADUAÇÃO em DATA SCIENCE.
  • 🇨🇭Deixei minha bagagem em um locker no aeroporto de Milão, e vim aproveitar esta última semana nos Alpes suíços!
  • Assista à cobertura completa no YT! Link nos stories 🚀
  • Traje espacial feito pela @axiom.space em parceria com a @prada 

Esse traje será usados pelos astronautas na lua.
para acompanhar as novidades do maior evento sobre espaço do mundo, veja os Stories!

#space #nasa #astronaut #rocket
  • INTERNATIONAL ASTRONAUTICAL CONGRESS - 🇮🇹IAC 2024🇮🇹

Veja a cobertura completa do evento nos DESTAQUES do meu perfil.

Esse é o maior evento de ESPAÇO do mundo! Eu e a @bnp.space estamos representando o Brasil nele 🇧🇷

#iac #space #nasa #spacex
  • 🚀 @bnp.space is building the Next Generation of Sustainable Rocket Fuel.

Join us in transforming the Aerospace Sector with technological and sustainable innovations.
  • 🚀👨‍🚀 Machine Learning para Aplicações Espaciais

Participei do maior congresso de Astronáutica do mundo, e trouxe as novidades e oportunidade da área de dados e Machine Learning para você!

#iac #nasa #spacex
  • 🚀👨‍🚀ACOMPANHE NOS STORIES

Congresso Internacional de Astronáutica (IAC 2024), Milão 🇮🇹
Instagram Youtube LinkedIn Twitter
Sigmoidal

O melhor conteúdo técnico de Data Science, com projetos práticos e exemplos do mundo real.

Seguir no Instagram

Categories

  • Aerospace Engineering
  • Blog
  • Carreira
  • Computer Vision
  • Data Science
  • Deep Learning
  • Featured
  • Iniciantes
  • Machine Learning
  • Posts

Navegar por Tags

3d 3d machine learning 3d vision apollo 13 bayer filter camera calibration career cientista de dados clahe computer vision custom dataset Data Clustering data science deep learning depth anything depth estimation detecção de objetos digital image processing histogram histogram equalization image formation job keras lens lente machine learning machine learning engineering nasa object detection open3d opencv pinhole profissão projeto python redes neurais roboflow rocket scikit-learn space tensorflow tutorial visão computacional yolov8 yolov9

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • Home
  • Cursos
  • Pós-Graduação
  • Blog
  • Sobre Mim
  • Contato
  • Português

© 2024 Sigmoidal - Aprenda Data Science, Visão Computacional e Python na prática.