About me

Infrastructure Engineer/Data Engineer graduated from Computational Statistics Program under Statistics Indonesia. I have worked on several Data Engineer/Data Science project with Statistics Indonesia and other government institution, resulted in publication on National and International media.

What i'm doing

  • data icon

    Data Science/Data Analyst

    Data analysis, machine learning, dashboard visualization, large language model.

  • server icon

    Infrastructure Engineer

    Google cloud infrastructure, server virtualization, backup solution, remote computer laboratory

  • database icon

    Data Engineering

    Web scraping, Cloud Data Warehouse, ETL, on premise campus infrastructure for student’s thesis focused in big data and machine learning.

  • web icon

    System Administrator

    Administrator for dozens of web and database hosting on Linux and Windows server

  • automation icon

    IT Automation

    Automate repeatable processes and reduce manual labor for various IT Task

Clients

Résumé

Education

  1. STIS Polytechnic of Statistics   3.58 GPA 

    2014 — 2018

    Bachelor of Applied Science, Computational Statistics Program

Experience

  1. Infrastructure Engineer/Data Engineer   Statistics Indonesia 

    2019 — now

    • Led the design, implementation, and maintenance of a highly available on-premise distributed server virtualization infrastructure hosting dozens of VMs, ensuring optimal performance and resource utilization.
    • Successfully deployed and managed on-premise Docker swarm and Kubernetes clusters, incorporating a monitoring dashboard for real-time performance tracking and proactive issue resolution.
    • Orchestrated the provisioning of Google Cloud architecture for the Campus admission system, improving scalability and performance, resulting in enhanced user experiences for thousands of applicants.
    • Spearheaded the creation and management of the Campus infrastructure and environment for student’s thesis projects, specifically focusing on Machine Learning and Big Data applications, facilitating cutting-edge research and development.
    • Implemented and maintained a Remote Computer Laboratory system, allowing students to access computing resources remotely, enhancing accessibility and collaboration.
    • Efficiently managed the domain hosting manager for hundreds of websites under the Campus domain, ensuring seamless and secure web hosting services for various academic and administrative needs.
    • Successfully deployed a cloud data warehouse with Google BigQuery, supporting multiple data projects with millions of data points, enabling data-driven decision-making and insights.

  2. Data Analyst/Data Scientist   Statistics Indonesia 

    2019 — now

    • Developed a Python-based web scraper to collect and extract data from multiple online marketplace, enabling comprehensive data analysis and providing valuable insights into market trends and customer behavior.
    • Created a Python package for data imputation with a specific focus on sample weight requirements, enhancing data accuracy and completeness in various analytical projects.
    • Leveraged Python and data visualization tools to design and build various BI dashboards, facilitating real-time data tracking and providing actionable insights for on-demand projects.
    • Conducted in-depth analysis on millions of E-commerce data points for National E-commerce research, identifying key performance indicators and trends to guide strategic decision-making.
    • Collaborated closely with cross-functional teams to ensure data quality, integrity, and consistency in data analysis projects.

  3. Web Developer   International Coconut Community 

    2017 — 2018

    • Developed a comprehensive data entry and dynamic visualization/tabulation web, ensuring seamless data processing and intuitive user interfaces.
    • Developed and implemented a dynamic form generator with robust validation capabilities, utilizing database integration to streamline data input and validation processes.
    • Pioneered the creation of a dynamic query builder for data tabulation, allowing for flexible and efficient data analysis and visualization.

My skills

  • Data Science
    80%
    Python, R, Tableau, Google Data Studio
  • Infrastructure Engineer
    80%
    Kubernetes, Docker, Linux, Server Virtualization, Web Server, Database(MySQL, PostgreSQL), Google Cloud Platform, Amazon Web Services, Replication, Load Balance
  • Data Engineer
    90%
    Web Crawling/Scraping, ETL, SQL, Big Query, Cloud Dataprep, Airflow
  • Web Developer
    60%
    PHP, Javascript

Portfolio

Click to view portfolio description and live demo URL

  • vision

    TimetableGPT

    Data Engineer

    Description

    Open source GPT based Timetable assistant

    TimetableGPT use the power of Large Language Models to allow you to ask questions about your timetable and get accurate answers to help you manage it so that no schedule will overlap each other. Built using Langchain + OpenAI GPT 3.5 model with custom function, fewshot examples and chat memory to give better reasoning capabilities compared to GPT 4 since the latter is not publicly available yet.

    View
  • vision

    Computer Vision Machine Learning based Proctoring

    Data Engineer

    Description

    Online proctoring via Zoom Meeting using computer vision machine learning model.

    In this project, I integrage a YoloV4 object detection model and feed it with live Zoom Meeting input video as online proctoring method. The model can be used to detect any prohibited objects existance such as smartphone or detect the presence of others, and capture the image of detection as a report.

  • ovirt

    Distributed Server Virtualization

    Infrastructure Engineer

    Description

    On premise distributed server virtualization solution using open source oVirt project. The solution built using three bare metal server (RedHat Linux), two network attached storage (NFS/iSCSI), terabytes of RAM and storage. The ecosystem is provided with automatic backup solution for choosen out of dozens VM hosted there to provide quick disaster recovery scenario.

    Integrated with Active Directory, user can automatically login to Windows VM created there.

  • nlp

    Duplicate Questions Detection System with NLP

    Data Science

    Description

    Detection of duplicate questions in a Q&A system is challenging especially in Bahasa Indonesia. The degree of overall similarity of text data in the form of a question can be seen from the similarity of the frequency of words used (textual component), the similarity of the word meanings (semantic component) or the similarity of core issue (topic component).

    In this project, I built a new model powered by Gensim (Python) which combines the similarity score for each component above. Experimental results show that this model get an increase recall-rate for up to 12% compared to their baseline.

    View
  • bigquery

    BigQuery Data Warehouse

    Data Engineer

    Description

    Managing millions of data in BigQuery which is a serverless, highly-scalable, and cost-effective cloud data warehouse with data collected from multiple sources, partitioned to optimize query process, and visualized using Tableau/Google Data Studio

    View
  • kepler

    Healthcare Access Visualization in Tableau

    Data Science

    Description

    Healthcare Access Visualization in Tableau

    View
  • kepler

    Map Visualization of BPJS Indonesia

    Data Science

    Description

    Visualizing the distribution of healthcare facilities in Indonesia with tens of thousands scraped data from Social Security Agency of Health in Indonesia. The map consists of two layers which shows the distribution of general healthcare facilities (heatmap) and hospitals (bar).

    While the distribution of general healthcare facilities is relatively equal, this map shows that there are still inequality in which better hospitals with greater infrastructure are still mostly located in Java Island.

    View
  • web scraper

    Web Crawling/Web Scraping

    Data Engineer

    Description

    Developing web scraper to scrape millions of data from various website such as travel agency, online shopping, news and social media using Scrapy (Python) or Selenium. Depending of the complexity and website architecture, the program can collect hundreds or even thousands of data per minute while still respecting source website policy.

    View
  • remote computer laboratory

    Remote Computer Laboratory

    Infrastructure Engineer

    Description

    Deploying campus remote computer laboratory using Apache Guacamole in response to Covid-19 Safety Protocol. Using this project, both student and lecturer can access the computer remotely from their browser without installing any plugins/software. The lab consist of ~80 computers with Active Directory integration.

  • e commerce research

    Data Visualization with Tableau

    Data Science

    Description

    Visualizing the growth and distribution of E commerce in Indonesia using millions of scraped data of online shop website from 2015 to 2019 using Tableau and BigQuery.

    View
  • gcp

    Application Hosting on GCP

    Infrastructure Engineer

    Description

    Deploy and manage college admission system on Google Cloud Platfrom using Google Compute Engine and Google Cloud SQL to achieve maximum availability.

  • streamlit

    Streamlit Calendar

    Data Science

    Description

    Python package that provides a Streamlit component to show calendar view using FullCalendar with support for Streamlit light/dark theme and event callback

    View
  • twitter

    Twitter Campaign Analysis

    Data Science

    Description

    Data analysis of twitter by looking at the number of tweet related to specific hashtag or query over time. By looking at the distribution we can conclude whether a social media campaign is successful or not.

  • data studio

    Google Data Studio/Looker Studio Visualization

    Data Science

    Description

    Visualization Dashboard of


    built using Google Data Studio/Looker Studio.

    View
  • dash

    Web Dashboard (Dash Python)

    Data Science

    Description

    Visualization Dashboard from hundreds of thousands of E Commerce Data built using Dash (Python).

    View
  • virtualmin

    Domain Hosting Manager

    System Administrator

    Description

    Managing hundreds of student/lecturer/campus web and database using open source virtual web hosting control panel Virtualmin. Each domain can specify whether PHP, NodeJS, Python, etc as their backend language and pick database (MySQL, PostgreSQL, etc) needed without interfering each other. The system also provide daily automatic domain backup and SSL configuration for easy web hosting for student and lecturer.

    View
  • dash

    Web Testing Automation

    IT Automation

    Description

    Automation of college admission website testing using Selenium to simulate user action and capturing server performance. The program simulates the user login, do the admission test and then check their results. This testing proves important to simulates real life condition and load for a server.