# PDF Import Setup & Quick Start Guide

## ✅ Installation Complete

The PDF import system has been successfully installed with the following components:

### 📁 Files Created

1. **Migration**: `database/migrations/2025_11_08_000001_create_pdf_import_logs_table.php`
2. **Model**: `app/Models/PdfImportLog.php`
3. **Service**: `app/Services/VoterPdfImportService.php`
4. **Job**: `app/Jobs/ProcessVoterPdfImport.php`
5. **Controller**: `app/Http/Controllers/VoterPdfImportController.php`
6. **Routes**: Added to `routes/api.php`
7. **Documentation**: `PDF_IMPORT_API_DOCUMENTATION.md`
8. **Postman Collection**: `Voter_PDF_Import_API.postman_collection.json`

### 📦 Packages Installed

- `smalot/pdfparser` (v2.12.1) - PDF text extraction

### 🗄️ Database

- Table `pdf_import_logs` created successfully
- Storage directory `storage/app/pdf-imports/` created

---

## 🚀 Quick Start

### Step 1: Start Queue Worker (For Background Processing)

```bash
# In a separate terminal
php artisan queue:work

# Or use supervisor in production
```

### Step 2: Test with Analyze Endpoint First

```bash
curl -X POST http://localhost:8000/api/pdf-import/analyze \
  -F "pdf_file=@/path/to/your/voter_list.pdf"
```

This will show you:
- Total pages
- Sample lines from PDF
- Detected patterns
- Booth numbers found

### Step 3: Upload PDF for Import

```bash
# Background processing (recommended)
curl -X POST http://localhost:8000/api/pdf-import/upload \
  -F "pdf_file=@/path/to/your/voter_list.pdf" \
  -F "process_immediately=false" \
  -F "uploaded_by=1"

# Immediate processing (for testing)
curl -X POST http://localhost:8000/api/pdf-import/upload \
  -F "pdf_file=@/path/to/your/voter_list.pdf" \
  -F "process_immediately=true" \
  -F "uploaded_by=1"
```

### Step 4: Monitor Status

```bash
# Get status of import ID 1
curl http://localhost:8000/api/pdf-import/status/1

# Get all imports
curl http://localhost:8000/api/pdf-import/all

# Get statistics
curl http://localhost:8000/api/pdf-import/statistics
```

---

## 📋 API Endpoints Summary

| Method | Endpoint | Purpose |
|--------|----------|---------|
| POST | `/api/pdf-import/upload` | Upload PDF file |
| POST | `/api/pdf-import/analyze` | Analyze PDF structure (no import) |
| GET | `/api/pdf-import/status/{id}` | Get import status |
| GET | `/api/pdf-import/all` | List all imports |
| GET | `/api/pdf-import/statistics` | Overall statistics |
| POST | `/api/pdf-import/reprocess/{id}` | Retry failed import |
| DELETE | `/api/pdf-import/delete/{id}` | Delete import & file |
| GET | `/api/pdf-import/download/{id}` | Download original PDF |

---

## 🎯 Workflow

```
┌─────────────────────┐
│ 1. Upload PDF       │
│  (max 20MB)         │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 2. Store in         │
│  pdf-imports/       │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 3. Create Log Entry │
│  (pending status)   │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 4. Queue Job OR     │
│  Process Immediate  │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 5. Parse PDF        │
│  Extract Text       │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 6. Extract Voters   │
│  (Pattern Matching) │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 7. Batch Insert     │
│  (100 voters/batch) │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ 8. Update Status    │
│  (completed/failed) │
└─────────────────────┘
```

---

## 🔧 Configuration

### File Upload Limits

In `php.ini`:
```ini
upload_max_filesize = 20M
post_max_size = 25M
max_execution_time = 300
```

### Queue Configuration

In `.env`:
```env
QUEUE_CONNECTION=database
# or
QUEUE_CONNECTION=redis  # For better performance
```

### Storage Permissions

```bash
chmod -R 775 storage/app/pdf-imports
```

---

## 📊 Supported PDF Formats

The service supports multiple Election Commission PDF formats:

### Format 1: Serial + EPIC + Name + Gender + Year
```
1 ABC1234567 John Doe M 1990
2 XYZ9876543 Jane Smith F 1985
```

### Format 2: EPIC + Name + Age + Gender
```
ABC1234567 John Doe 35 Male
XYZ9876543 Jane Smith 40 Female
```

### Format 3: Name - EPIC - Age - Gender
```
John Doe - ABC1234567 - Age: 35 - Male
Jane Smith - XYZ9876543 - Age: 40 - Female
```

### Booth Number Detection
```
Booth No: 123
Part No: 124
Booth Number: 125
```

---

## 🔍 Testing Checklist

- [ ] Test analyze endpoint with sample PDF
- [ ] Verify patterns are detected correctly
- [ ] Upload small PDF (immediate processing)
- [ ] Check voter data in database
- [ ] Upload large PDF (background processing)
- [ ] Monitor queue worker logs
- [ ] Test status endpoint
- [ ] Test statistics endpoint
- [ ] Test reprocess for failed imports
- [ ] Test delete functionality

---

## 🐛 Troubleshooting

### Issue: No voters extracted

**Solution:**
1. Use analyze endpoint to check PDF structure
2. Verify PDF is text-based (not scanned image)
3. Check if booth numbers are detected
4. Review sample lines in analyze response

### Issue: Import stuck in "processing"

**Solution:**
```bash
# Check queue worker is running
php artisan queue:work

# Check logs
tail -f storage/logs/laravel.log

# Reprocess the import
curl -X POST http://localhost:8000/api/pdf-import/reprocess/{id}
```

### Issue: High failure rate

**Solution:**
1. Check `import_summary.errors` in status response
2. Verify booth exists in database
3. Check voter ID format (should be 3 letters + 7 digits)

### Issue: PDF format not recognized

**Solution:**
Update regex patterns in `app/Services/VoterPdfImportService.php`:
- Method: `extractVotersFromText()`
- Add new pattern matching logic

---

## 📈 Performance Tips

1. **Use Background Processing** for files > 500 voters
2. **Run Queue Worker** as daemon in production
3. **Monitor Disk Space** - PDFs are stored permanently
4. **Clean Old Imports** regularly via delete endpoint
5. **Use Redis Queue** for better performance
6. **Increase PHP Memory** for large PDFs

---

## 🔐 Security Notes

1. **File Type Validation**: Only PDF files accepted
2. **File Size Limit**: Maximum 20MB
3. **Storage Path**: Files stored in `storage/app/pdf-imports/`
4. **Unique Filenames**: UUID-based naming prevents conflicts
5. **Database Logging**: All operations tracked in `pdf_import_logs`

---

## 📖 Additional Resources

- **Full API Documentation**: `PDF_IMPORT_API_DOCUMENTATION.md`
- **Postman Collection**: `Voter_PDF_Import_API.postman_collection.json`
- **Laravel Queue Docs**: https://laravel.com/docs/queues

---

## ✨ Features

✅ Upload PDF files (max 20MB)  
✅ Store PDFs in dedicated folder  
✅ Analyze PDF structure and patterns  
✅ Extract voter information  
✅ Batch processing (100 voters/batch)  
✅ Background job processing  
✅ Track import status  
✅ Handle duplicates (update existing)  
✅ Error handling and logging  
✅ Reprocess failed imports  
✅ Download original PDFs  

---

## 🎉 Ready to Use!

Your PDF import system is now fully set up and ready to process Election Commission PDFs.

Start by analyzing a sample PDF:
```bash
curl -X POST http://localhost:8000/api/pdf-import/analyze \
  -F "pdf_file=@/path/to/voter_list.pdf"
```

For questions or custom PDF format support, update the pattern matching logic in:
`app/Services/VoterPdfImportService.php`
