Loc Mai
Senior Site Reliability Engineer
- Location: Seattle, WA, USA
- Mobile: (+1) 206-234-8089
- Email: [email protected]
- Website: https://maibaloc.com
- GitHub: github.com/locmai
- LinkedIn: linkedin.com/in/locmai0201/
Experience
Senior Site Reliability Engineer, Observability and Identity / Cloud Networking @ Axon
January 2023 — Present- Lead the Cloud Networking Service Mesh team to build and operate the multi-cluster Istio service mesh across the networks on Azure and AWS.
- Architected and built new development tools environments based on Azure global transit network architecture with Azure VPN and Azure AD Domain Services. Lead climber for building infrastructure layers and monitoring platform.
- Centralized the identity management for the developer tools environment.
- Supported the migration from GitHub Enterprise to GitHub Enterprise Cloud Managed Users.
- Collaborated and coordinated works across corporate departments including platform engineering team and company-wide infrastructure team.
- Was responsible for on-call rotations and handled production incidents.
- Conducted technical interviews to hire Site Reliability Engineers and Database Reliability Engineers from Junior to Senior level.
- Designed multi-cluster Istio service mesh solution with self-managed PKI platform.
- Worked on Nginx Ingress Controller and Envoy Proxy.
Site Reliability Engineer, Observability and Identity @ Axon
April 2021 — December 2022- Saved approximately $300K per year on monitoring infrastructure cost and helped company to earn more contracts on the FedRAMP High level by migrating from DataDog service to a self-built Unified Observability Platform with Prometheus, Grafana, and Cortex.
- Experienced with keeping high availability Prometheus instances on an extremely large scale.
- Experienced with OpenTelemetry observability framework.
- Built and unified Azure PaaS metrics with Promitor to help streaming the metrics from Azure to the on-premise monitor platform.
- Operated and managed Splunk for logging.
- Built and managed Identity Access management for Azure Cloud / AWS and the internal infrastructure.
Site Reliability Engineer, CloudOps @ Axon
May 2020 — April 2021- Operated the cloud infrastructure layer with Kubernetes platform.
- Experienced with Puppet, Terraform, Ansible and Salt on daily tasks.
- Monitored and troubleshooted for the CentOS and Windows servers.
- Got rid of the manual toil from the system by engineering the CloudOps tools including on-boarding scripts, auto-unlock, metric exporters, and various utilities.
- Operated AWS and Azure cloud providers.
Site Reliability Engineer @ Grove HR Solution
August 2018 — May 2020- Self-led Site Reliability engineering works.
- Fully built and managed production-grade cloud infrastructure on Amazon Web Services (AWS) and Google Cloud Platform.
- Built continuous integration and delivery systems, including Argo Stacks (ArgoCD, Argo Workflows, Argo Events), Jenkins.
- Managed cloud-native Kubernetes clusters and Istio Service Mesh.
- Implemented microservices with TypeScript, NestJS, Lerna.
- Developed Slack bot for Continuous Integration with Python and internal services with Golang.
- Built CI/CD systems for the React Native team using Fastlane and App Center.
- Administrated Mongo database and provision infrastructure services.
- Implemented the end-to-end data migration for MongoDB using Argo Workflows. Provided a fully automated self-tested migration process before rolling out to production.
Performance Test Engineer @ Athenka
October 2016 — July 2018- Was responsible for the development and maintenance of automation testing and performance testing on a Chat Bot project; automatically validating the accuracy of the answers from different Natural Language Processing models of the Bot NLP Engine.
R&D Intern @ KMS Technology
September 2015 — September 2016- Studied solutions for performance, security and automation testing.
- Conducted various training programs on building Automation testing framework and operating Linux systems for Junior QA engineers.
Certificates
Kubestronaut
Issued Jun 2024The Linux Foundation
CKS: Certified Kubernetes Security Specialist
Issued Jun 2024 - Expires Jun 2026The Linux Foundation - Credential ID LF-ugsf6uvgk0
CKA: Certified Kubernetes Administrator
Issued May 2024 - Expires May 2026The Linux Foundation - Credential ID LF-a3n68gkl69
CKAD: Certified Kubernetes Application Developer
Issued May 2024 - Expires May 2026The Linux Foundation - Credential ID LF-f3s1l6nmkc
Open Source
Humble - Self-hosted home server built with infrastructure as code
- A ready-to-go environment with operating services using modern infrastructure as code with GitOps driven and managed by Kubernetes.
Yuta - A modern ChatOps framework
- Extensible ChatOps framework for building system operation bots.
- Integrated with Matrix, Slack, Dialogflow.
Promitor - Azure Monitor scraper
- Contributed to Promitor: an Azure Monitor scraper that exposes the metrics through a scraping endpoint for Prometheus.
OpenTelemetry
- Contributed in part to various OpenTelemetry projects, including OpenTelemetry Collector, Helm charts, and the Golang SDK.
Jaeger - End-to-end distributed tracing
- Contributed in writing the Helm chart and bug fixing.
Hackathons
KMS Hackathon 2018 - Runner-up team
Main backend developer
- Built an app that optimizes job search for under mid-wage workers using a chat bot interface.
Zalo Hackathon 2017 - Top 10 team
Main software developer
- Developed a music recommendation service that learns from users' listening history.
Facebook Hackathon 2016 - Honorable Mention
Semi-backend developer
- Built a chatbot application that generates restaurant recommendations based on users' location and preference.
Education
High School for the Gifted - Vietnam National University
September 2011 - April 2014Studied the Information Technology specialist program in High School.