# PDF to PNG Extraction API

## Overview

This API endpoint allows you to upload a PDF file containing voter lists and automatically extract each page as a high-quality PNG image. The extracted images are saved directly into the constituency folder structure, ready for OCR processing.

## Endpoint

**POST** `/api/pdf-to-png/extract`

## Use Case

When you receive voter list PDFs and need to:
1. Convert PDF pages to PNG images
2. Store them in the correct constituency/booth folder
3. Prepare them for OCR-based voter extraction

## Request Parameters

### Required Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `pdf_file` | File | PDF file to extract (max 50MB) |
| `constituency` | String | Constituency name (e.g., "Lawspet", "Orleanpet") |
| `booth_number` | String/Number | Booth number (e.g., "1", "2") |

### Optional Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dpi` | Integer | 300 | Image resolution (DPI). Higher = better quality but larger files |

## Request Example

### Using cURL

```bash
curl -X POST http://localhost:8000/api/pdf-to-png/extract \
  -F "pdf_file=@voters.pdf" \
  -F "constituency=Lawspet" \
  -F "booth_number=3" \
  -F "dpi=300"
```

### Using Postman

1. **Method**: POST
2. **URL**: `http://localhost:8000/api/pdf-to-png/extract`
3. **Body**: Form-data
   - Key: `pdf_file`, Type: File, Value: [Select your PDF]
   - Key: `constituency`, Type: Text, Value: `Lawspet`
   - Key: `booth_number`, Type: Text, Value: `3`
   - Key: `dpi`, Type: Text, Value: `300` (optional)

### Using JavaScript/Fetch

```javascript
const formData = new FormData();
formData.append('pdf_file', fileInput.files[0]);
formData.append('constituency', 'Lawspet');
formData.append('booth_number', '3');
formData.append('dpi', '300');

fetch('http://localhost:8000/api/pdf-to-png/extract', {
  method: 'POST',
  body: formData
})
.then(response => response.json())
.then(data => console.log(data));
```

## Response

### Success Response (200 OK)

```json
{
  "success": true,
  "message": "PDF pages extracted successfully",
  "constituency": "Lawspet",
  "booth_number": "3",
  "output_directory": "/var/www/Constituency/Lawspet/3",
  "total_pages": 5,
  "extracted_pages": 5,
  "failed_pages": 0,
  "files": [
    {
      "page": 1,
      "filename": "voters 1.png",
      "path": "/var/www/Constituency/Lawspet/3/voters 1.png",
      "size": 1245678
    },
    {
      "page": 2,
      "filename": "voters 2.png",
      "path": "/var/www/Constituency/Lawspet/3/voters 2.png",
      "size": 1234567
    }
  ],
  "errors": [],
  "next_step": "Use /api/image-import/run endpoint to process the extracted images"
}
```

### Validation Error (422)

```json
{
  "success": false,
  "message": "Validation failed",
  "errors": {
    "pdf_file": ["The pdf file field is required."],
    "constituency": ["The constituency field is required."]
  }
}
```

### ImageMagick Not Found Error (500)

```json
{
  "success": false,
  "message": "ImageMagick not found",
  "error": "ImageMagick (convert command) is required but not installed. Install with: brew install imagemagick"
}
```

### Partial Extraction (200 with errors)

```json
{
  "success": true,
  "message": "PDF pages extracted successfully",
  "constituency": "Lawspet",
  "booth_number": "3",
  "output_directory": "/var/www/Constituency/Lawspet/3",
  "total_pages": 5,
  "extracted_pages": 4,
  "failed_pages": 1,
  "files": [
    /* Successfully extracted pages */
  ],
  "errors": [
    {
      "page": 5,
      "error": "Conversion failed",
      "output": "Error details from ImageMagick"
    }
  ],
  "next_step": "Use /api/image-import/run endpoint to process the extracted images"
}
```

## Output File Naming

Extracted PNG files follow the naming convention:
- `voters 1.png` - First page
- `voters 2.png` - Second page  
- `voters 3.png` - Third page
- etc.

This naming matches the expected format for the voter image import system.

## File Structure After Extraction

```
Constituency/
└── Lawspet/
    └── 3/
        ├── voters 1.png    ← Extracted from PDF page 1
        ├── voters 2.png    ← Extracted from PDF page 2
        ├── voters 3.png    ← Extracted from PDF page 3
        └── voters 4.png    ← Extracted from PDF page 4
```

## Prerequisites

### Install ImageMagick

ImageMagick is required to convert PDF pages to PNG images.

#### macOS
```bash
brew install imagemagick
```

#### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install imagemagick ghostscript
```

#### CentOS/RHEL
```bash
sudo yum install ImageMagick ghostscript
```

### Verify Installation

```bash
convert -version
```

Should output ImageMagick version information.

## Workflow Integration

### Complete Voter Import Workflow

1. **Upload PDF and Extract Pages**
   ```bash
   POST /api/pdf-to-png/extract
   {
     "pdf_file": "voters.pdf",
     "constituency": "Lawspet",
     "booth_number": "3"
   }
   ```

2. **Process Extracted Images with OCR**
   ```bash
   POST /api/image-import/run
   {
     "constituency": "Lawspet",
     "booth_number": "3"
   }
   ```

3. **Verify Import Results**
   - Check voters table for imported records
   - Review summary statistics

## DPI Settings Guide

| DPI | Quality | File Size | Use Case |
|-----|---------|-----------|----------|
| 150 | Low | Small | Quick preview, testing |
| 200 | Medium | Medium | Acceptable for most text |
| 300 | High | Large | **Recommended** - Good OCR accuracy |
| 400 | Very High | Very Large | Best quality, slower processing |
| 600 | Maximum | Huge | Archival quality, overkill for OCR |

**Recommendation**: Use 300 DPI (default) for optimal balance between quality and file size.

## Error Handling

### Common Errors

**1. ImageMagick Not Installed**
```
Error: ImageMagick (convert command) is required but not installed
Solution: Install ImageMagick using your package manager
```

**2. PDF File Too Large**
```
Error: The pdf file must not be greater than 51200 kilobytes
Solution: Split large PDFs or increase max upload size in php.ini
```

**3. Invalid PDF File**
```
Error: The pdf file must be a file of type: pdf
Solution: Ensure the file is a valid PDF and not corrupted
```

**4. Directory Creation Failed**
```
Error: Failed to create directory
Solution: Check filesystem permissions for Constituency folder
```

**5. Ghostscript Not Found (Linux)**
```
Error: Conversion failed
Solution: Install ghostscript: sudo apt-get install ghostscript
```

## PHP Configuration

For large PDF uploads, update `php.ini`:

```ini
upload_max_filesize = 50M
post_max_size = 50M
max_execution_time = 300
memory_limit = 256M
```

Restart PHP-FPM/Apache after changes.

## Security Considerations

- Only PDF files are accepted (MIME type validation)
- Files are validated before processing
- Temporary files are cleaned up after extraction
- Output is restricted to constituency folders
- Maximum file size: 50MB

## Performance Notes

- **Small PDFs (1-10 pages)**: ~2-5 seconds
- **Medium PDFs (10-50 pages)**: ~10-30 seconds
- **Large PDFs (50+ pages)**: ~1+ minutes

Processing time depends on:
- PDF page count
- DPI setting
- Server resources
- ImageMagick version

## Testing

### Test with Sample PDF

```bash
# Create a test PDF directory
mkdir -p /tmp/test-pdf

# Extract a test PDF
curl -X POST http://localhost:8000/api/pdf-to-png/extract \
  -F "pdf_file=@voters.pdf" \
  -F "constituency=TestConstituency" \
  -F "booth_number=999" \
  -F "dpi=200"

# Verify extracted files
ls -lh /var/www/Constituency/TestConstituency/999/
```

### Check ImageMagick Availability

```bash
php artisan tinker --execute="
\$ctrl = new \App\Http\Controllers\PdfToPngController();
\$method = new \ReflectionMethod(\$ctrl, 'findConvertCommand');
\$method->setAccessible(true);
\$path = \$method->invoke(\$ctrl);
echo 'ImageMagick convert: ' . (\$path ?: 'NOT FOUND') . PHP_EOL;
"
```

## Troubleshooting

### Images Not Created

1. Check ImageMagick installation: `convert -version`
2. Check Ghostscript installation: `gs -version`
3. Verify file permissions on Constituency folder
4. Check Laravel logs: `storage/logs/laravel.log`

### Poor OCR Quality

- Increase DPI to 400 or 600
- Ensure source PDF has good quality scans
- Check if PDF is text-based (not scanned images)

### Slow Conversion

- Reduce DPI to 200
- Process PDFs in smaller batches
- Upgrade server resources
- Use queue workers for async processing

## API Response Fields

| Field | Type | Description |
|-------|------|-------------|
| `success` | Boolean | Whether extraction succeeded |
| `message` | String | Human-readable message |
| `constituency` | String | Constituency name provided |
| `booth_number` | String | Booth number provided |
| `output_directory` | String | Absolute path where files were saved |
| `total_pages` | Integer | Total pages in PDF |
| `extracted_pages` | Integer | Successfully extracted pages |
| `failed_pages` | Integer | Failed page count |
| `files` | Array | Details of extracted files |
| `errors` | Array | Details of any failures |
| `next_step` | String | Guidance for next action |

## Next Steps After Extraction

After successfully extracting PDF pages to PNG:

1. **Add booth_info.png** (if available)
   - Manually place booth information image in the same folder

2. **Process with OCR**
   ```bash
   POST /api/image-import/run
   {
     "constituency": "Lawspet",
     "booth_number": "3"
   }
   ```

3. **Verify Results**
   - Check booth information extracted
   - Verify voter records created
   - Review any skipped or failed voters

## Support

For issues or questions:
- Check logs: `storage/logs/laravel.log`
- Verify ImageMagick: `convert -version`
- Test with small PDFs first
- Review error messages in API response
