LLM AI Research

Lightweight Fine-Tuning for Document Automation in the Cloud
By Gabriele Monti – MSc Data Science, Birkbeck, University of London


What This Is

This project explores how to fine-tune large language models (LLMs) efficiently for document manipulation tasks in cloud environments. The focus is on building resume parsers and PDF-to-JSON converters using lightweight methods that reduce training time and hardware requirements.


Key Use Cases

  • Resume to structured JSON
  • Invoice and form parsing
  • Automatic document summarization
  • PDF data extraction
  • AI-powered document classification

Our Approach

We use parameter-efficient fine-tuning (PEFT) methods such as:

  • LoRA (Low-Rank Adaptation)
  • Adapter modules
  • Prefix tuning

These approaches allow us to achieve high accuracy while remaining cost-effective and cloud-deployable.


What We Built

  • Resume-to-JSON parser using Google Gemma + LoRA
  • Full pipeline: data preparation, training, evaluation, deployment
  • Lightweight deployment on Hugging Face Spaces and Docker
  • Evaluation with WeightWatcher for layer quality and compression metrics
  • API and Web Interface for real-time testing

Technologies Used

  • Python + PyTorch
  • Hugging Face Transformers & PEFT
  • Google Colab, Docker, Hugging Face Spaces
  • REST API (FastAPI) and frontend integration

Try It or See the Code

The full codebase is open-source:

GitHub Repository:
https://github.com/Birkbeck/msc-projects-2023-4-Gabriele_Monti_PEFT


Work With Us

Interested in deploying AI for document processing in your business?
Get in touch for a tailored solution.

Scroll to Top