Loc Mai #
Senior Site Reliability Engineer #
Contacts #
-
Ho Chi Minh city, Viet Nam
-
Mobile: (+84) 90-858-4595
-
Email: [email protected]
-
Website: https://maibaloc.com
-
GitHub: https://github.com/locmai
Experiences #
-
Senior Site Reliability Engineer, Observability and Identity / Cloud Networking (January 2023 - Present) – Axon
- Lead the Cloud Networking Service Mesh team to build and operate the multi-cluster Istio service mesh across the networks on Azure and AWS.
- Architected and built new development tools environments based on Azure global transit network architecture with Azure VPN and Azure AD Domain Services. Lead climber for building infrastructure layers and monitoring platform.
- Centralized the identity management for the developer tools environment.
- Supported the migration from GitHub Enterprise to GitHub Enterprise Cloud Managed Users.
- Collaborated and coordinated works across corporate departments including platform engineering team and company-wide infrastructure team.
- Was responsible for on-call rotations and handled production incidents.
- Conducted technical interviews to hire Site Reliability Engineers and Database Reliability Engineers from Junior to Senior level.
- Designed multi-cluster Istio service mesh solution with self-managed PKI platform.
- Worked on Nginx Ingress Controller and Envoy Proxy.
-
Site Reliability Engineer, Observability and Identity (April 2021 - December 2022) – Axon
- Saved approximately $300K per year on monitoring infrastructure cost & helped company to earn more contracts on the FedRAMP High level by migrating from DataDog service to a self-build Unified Observability Platform with Prometheus, Grafana, and Cortex.
- Experienced with keeping high availability Prometheus instances on an extremely large scale.
- Experienced with OpenTelemetry observability framework.
- Built and unified Azure PaaS metrics with Promitor to help streaming the metrics from Azure to the on-premise monitor platform.
- Operated and managed Splunk for logging.
- Built and managed Identity Access management for Azure Cloud/AWS and the internal infrastructure.
-
Site Reliability Engineer, CloudOps (May 2020 - April 2021) – Axon
- Operated the cloud infrastructure layer with Kubernetes platform.
- Experienced with Puppet, Terraform, Ansible and Salt on daily tasks
- Monitored and troubleshooted for the CentOS and Windows servers.
- Got rid of the manual toil from the system by engineering the CloudOps tools including on-boarding scripts, auto-unlock, metric exporters, and various utilities.
- Operated AWS and Azure cloud providers.
-
Site Reliability Engineer (August 2018 - May 2020) – Grove HR Solution
- Self-led Site Reliability engineering works.
- Fully built and managed production-grade cloud infrastructure on Amazon Web Services (AWS) and Google Cloud Platform.
- Built continuous integration & delivery systems, including Argo Stacks (ArgoCD, Argo Workflows, Argo Events), Jenkins.
- Managed cloud-native Kubernetes clusters and Istio Service Mesh.
- Implemented microservices with TypeScript, NestJS, Lerna.
- Developed Slack bot for Continuous Integration with Python and internal services with Golang.
- Built CI/CD systems for the React Native team using Fastlane and App Center.
- Administrated Mongo database and provision infrastructure services.
- Implemented the end-to-end data migration for MongoDB using Argo Workflows. Provide a fully automated self-tested migration process before rolling out to production.
-
Performance Test Engineer (October 2016 – July 2018) – Athenka
- Was responsible for the development and maintenance of automation testing and performance testing on a Chat Bot project – automatically validating the accuracy of the answers from different Natural Language Processing models of the Bot NLP Engine.
-
R&D Intern (September 2015 – September 2016) – KMS Technology
- Studied solutions for performance, security and automation testing.
- Conducted various training programs on building Automation testing framework and operating Linux systems for Junior QA engineers.
Certificates #
-
Microsoft Certified: DevOps Engineer Expert (Oct 2020)
- Certified to demonstrate the ability to combine people, process, and technologies to continuously deliver valuable products and services that meet end user needs and business objectives. DevOps professionals streamline delivery by optimizing practices, improving communications and collaboration, and creating automation.
-
Azure Administrator Associate (Sep 2020)
- Certified the skills and knowledge to implement, manage, and monitor an organization’s Microsoft Azure environment. Have a deep understanding of each implementing, managing, and monitoring identity, governance, storage, compute, and virtual networks in a cloud environment, plus provision, size, monitor, and adjust resources, when needed.
-
Certified Kubernetes Administrator (August 2020)
- Certified for the skills, knowledge and competencies to perform the responsibilities of a Kubernetes Administrator. Demonstrated proficiency in Application Lifecycle Management, Installation, Configuration & Validation, Core Concepts, Networking, Scheduling, Security, Cluster Maintenance, Logging / Monitoring, Storage, and Troubleshooting
Open-source Projects #
-
Humble - Self-hosted home server built with infrastructure as code
- Repository: https://github.com/locmai/humble
- Project site: https://humble.maibaloc.com/
- The project is seeking to set up a ready-to-go environment with operating services using modern infrastructure as code with GitOps driven and managed by Kubernetes.
-
Yuta - A modern ChatOps framework
- Repository: https://github.com/locmai/yuta
- A well-designed and extensible ChatOps framework for building system operation bots.
- Integrated with Matrix, Slack, Diaglogflow
-
Promitor - Azure Monitor scraper
- Repository: https://github.com/tomkerkhove/promitor
- Contributed to Promitor which is an Azure Monitor scraper which makes the metrics available through a scraping endpoint for Prometheus.
-
Cortex
- Project site: https://cortexmetrics.io
- Contributed in building Helm chart and bug fixing.
-
OpenTelemetry - Set of Telemetry tools, integrations, SDKs
- Project site: https://opentelemetry.io/
- Contributed in part to various OpenTelemetry projects, including OpenTelemetry Collector, Helm charts, and the Golang SDK.
-
Jaeger - End-to-end distributed tracing
- Project site: https://www.jaegertracing.io
- Contributed in writing the Helm chart and bug fixing.
Hackathons #
-
KMS Hackathon 2018 – Runner-up team
- Built an app that optimizes job search for under mid-wage workers using a chat bot interface.
- Main backend developer
-
Zalo Hackathon 2017 – Top 10 team
- Developed a music recommendation service that learns from users’ listening history.
- Main software developer
-
Facebook Hackathon 2016 – Honorable Mention
- Built a chatbot application that generates restaurant recommendations based on users’ location and preference
- Semi-backend developer
Education #
- High School for the Gifted - Vietnam National University (September 2011 – April 2014)
- Studied the Information Technology specialist program in High School