Goal-driven Cloud Lab Site Reliability Engineer skilled at analyzing and solving routine and complex problems. Detail-oriented and systematic professional successful at developing innovative solutions to diverse issues.
-Routing & Switching
- 4G, LTE, 5G architectures
-Maintain, configure, and troubleshoot the Juniper SRX 5800 firewalls, MX-960 routers, Juniper MX-320, Cisco CRS, Cisco 7609 and Cisco Nexus 7k routers, OpenStack and Contrail SDN.
-Responsible for the availability, performance, monitoring, and incident response, among other things of the platforms of Contrail, OpenStack services that are in AT&T Labs under AT&T Integrated Cloud.
-Responsible for every VNF that goes to production complies with a set of general requirements like diagrams, dependencies of other services, monitoring and logging plans, backups and possible high availability setups
-As part of tenant on boarding network designing and implementing.
-Supporting advanced Network Cloud design E2E.
-Being prepared and be ready to act at any point of time on uncaught exceptions, hardware degradation, networking problems, high usage of resources, or slow responses to keep our Contrail, OpenStack services uninterrupted.
-Keeping efforts as a function of mean time to recover (MTTR)and mean time to failure (MTTF) to recover Contrail, OpenStack services.
-Documenting incident response report, postmortem report which will contain a timeline of everything we did to fix the problem, a root cause analysis, corrective and preventative measures, and a section that describes the resolution and recovery of Contrail, OpenStack and other services.
-Monitor specific metrics related to OpenStack, contrail other Juniper products and set thresholds, and trigger alerts based on those thresholds.
-Create scripts to automate required daily activities and facilitate the deployment and rollback of new services or changes to existing ones.
-Supporting several applications related to 4G and other AT&T proprietary VNF's development and testing concerning Network, Security, Contrail and OpenStack issues.
-Documenting issues/Bugs on 50+ Lab Sites(18 Large and 35+ Medium)
-Tracking down Bugs with corresponding Vendors and providing updates to Production teams.
-Reporting design issues to corresponding teams from Troubleshoot summary to achieve VNF goals.
-Responsible for monitoring Nagios alarms and resolving issues.
-Providing support to Infrastructure Deployment teams before/during/after uplift.
-Performing Lab Readiness Testes (LRT) after every uplift.
-Writing test cases for Lab Readiness Test for enhanced/new features after uplift.
-Validating bug fixes with fixes provided from the vendors.
-Automating most recurring issues with workarounds until a fix is released.
- Linux /Network Operations.
-Contributed to deploying OpenStack multi-node lab environment.
-Setting up Ansible to push config changes on particular nodes.
-Automating basic Linux admin tasks using Python Scripts & playbooks.
-Monitoring activity of users.
-FW configurations on virtual SRX.
-Virtual Juniper Router Configurations from Scratch.
-Supporting issues related to OpenStack(Nova, Neutron, Cinder, Keystone, rabbitmq, Swift, Glance).
-Creating heat stack templates to create a full set up of clustered VM's using YAML.
-Provided status updates to those affected when issues came up.
Certificate ID: LF-7ejj2wcv4t
Certificate ID: 2017a482735342d49eb24b97ca6ecb12
Certificate ID: LFCS-1700-001315-0100
Certificate ID : COA-1600-0104-0100
Certificate ID : CSCO13123458
Certificate ID: 4BF7UARGAXLW
Certificate ID : 11827705
Resumes, and other information uploaded or provided by the user, are considered User Content governed by our Terms & Conditions. As such, it is not owned by us, and it is the user who retains ownership over such content.
Companies Worked For:
Job Titles Held: