# PDF Import API Documentation

## Overview
This API allows bulk import of voter data from PDF files downloaded from the Election Commission website. The system supports files up to 20MB and can process up to 2000+ voters per file.

## Features
- ✅ Upload PDF files (max 20MB)
- ✅ Store PDFs in dedicated folder (`storage/app/pdf-imports/`)
- ✅ Analyze PDF structure and patterns
- ✅ Extract voter information from multiple PDF formats
- ✅ Batch processing for large files (2000+ voters)
- ✅ Background job processing (asynchronous)
- ✅ Track import status and statistics
- ✅ Handle duplicate voters (update existing records)
- ✅ Error handling and logging
- ✅ Reprocess failed imports

## API Endpoints

### 1. Upload PDF for Import

**POST** `/api/pdf-import/upload`

Upload a PDF file and optionally process it immediately or queue for background processing.

**Request:**
```bash
POST /api/pdf-import/upload
Content-Type: multipart/form-data

Fields:
- pdf_file: (file, required) - PDF file (max 20MB)
- process_immediately: (boolean, optional) - Process now or queue (default: false)
- uploaded_by: (integer, optional) - Admin ID who uploaded
```

**cURL Example:**
```bash
curl -X POST http://your-domain.com/api/pdf-import/upload \
  -F "pdf_file=@/path/to/voter_list.pdf" \
  -F "process_immediately=false" \
  -F "uploaded_by=1"
```

**Response (Success - Queued):**
```json
{
  "status": "success",
  "message": "PDF upload successful",
  "data": {
    "import_log": {
      "id": 1,
      "original_filename": "voter_list.pdf",
      "stored_filename": "uuid_timestamp.pdf",
      "file_path": "pdf-imports/uuid_timestamp.pdf",
      "file_size": 5242880,
      "status": "pending",
      "total_voters": 0,
      "imported_voters": 0,
      "failed_voters": 0,
      "created_at": "2025-11-08T10:00:00.000000Z"
    },
    "message": "PDF uploaded and queued for processing"
  }
}
```

**Response (Success - Immediate):**
```json
{
  "status": "success",
  "message": "PDF processed successfully",
  "data": {
    "import_log": {
      "id": 1,
      "status": "completed",
      "total_voters": 1850,
      "imported_voters": 1845,
      "failed_voters": 5,
      "import_summary": {
        "total_pages": 50,
        "duplicate_voters": 12,
        "errors": []
      }
    }
  }
}
```

---

### 2. Analyze PDF Structure (Testing Only)

**POST** `/api/pdf-import/analyze`

Analyze PDF structure without importing. Useful for identifying patterns and testing.

**Request:**
```bash
POST /api/pdf-import/analyze
Content-Type: multipart/form-data

Fields:
- pdf_file: (file, required) - PDF file to analyze
```

**cURL Example:**
```bash
curl -X POST http://your-domain.com/api/pdf-import/analyze \
  -F "pdf_file=@/path/to/voter_list.pdf"
```

**Response:**
```json
{
  "status": "success",
  "message": "PDF analysis completed",
  "data": {
    "total_pages": 50,
    "total_lines": 5420,
    "sample_lines": [
      "Page 1",
      "Booth No: 123",
      "1 ABC1234567 John Doe M 1990",
      "..."
    ],
    "patterns_detected": [
      "Pattern 1: Serial + EPIC",
      "Pattern 2: EPIC + Name + Age + Gender"
    ],
    "booth_numbers_found": ["123", "124", "125"]
  }
}
```

---

### 3. Get Import Status

**GET** `/api/pdf-import/status/{id}`

Get the status of a specific import by ID.

**Request:**
```bash
GET /api/pdf-import/status/1
```

**cURL Example:**
```bash
curl -X GET http://your-domain.com/api/pdf-import/status/1
```

**Response:**
```json
{
  "status": "success",
  "message": "Import status retrieved successfully",
  "data": {
    "id": 1,
    "original_filename": "voter_list.pdf",
    "stored_filename": "uuid_timestamp.pdf",
    "file_path": "pdf-imports/uuid_timestamp.pdf",
    "file_size": 5242880,
    "status": "completed",
    "total_voters": 1850,
    "imported_voters": 1845,
    "failed_voters": 5,
    "error_message": null,
    "import_summary": {
      "total_pages": 50,
      "duplicate_voters": 12,
      "errors": [
        {
          "voter_id": "ABC1234568",
          "name": "Jane Doe",
          "error": "Invalid year of birth"
        }
      ]
    },
    "started_at": "2025-11-08T10:00:05.000000Z",
    "completed_at": "2025-11-08T10:02:30.000000Z",
    "uploaded_by": {
      "id": 1,
      "username": "admin"
    }
  }
}
```

---

### 4. Get All Imports

**GET** `/api/pdf-import/all`

Get all import logs with pagination and optional filtering.

**Query Parameters:**
- `per_page` (optional, default: 20) - Number of records per page
- `status` (optional) - Filter by status: pending, processing, completed, failed

**Request:**
```bash
GET /api/pdf-import/all?per_page=10&status=completed
```

**cURL Example:**
```bash
curl -X GET "http://your-domain.com/api/pdf-import/all?per_page=10&status=completed"
```

**Response:**
```json
{
  "status": "success",
  "message": "Import logs retrieved successfully",
  "data": {
    "current_page": 1,
    "data": [
      {
        "id": 1,
        "original_filename": "voter_list.pdf",
        "status": "completed",
        "total_voters": 1850,
        "imported_voters": 1845,
        "created_at": "2025-11-08T10:00:00.000000Z"
      }
    ],
    "per_page": 10,
    "total": 5
  }
}
```

---

### 5. Get Import Statistics

**GET** `/api/pdf-import/statistics`

Get overall import statistics and recent imports.

**Request:**
```bash
GET /api/pdf-import/statistics
```

**cURL Example:**
```bash
curl -X GET http://your-domain.com/api/pdf-import/statistics
```

**Response:**
```json
{
  "status": "success",
  "message": "Statistics retrieved successfully",
  "data": {
    "total_imports": 25,
    "pending_imports": 2,
    "processing_imports": 1,
    "completed_imports": 20,
    "failed_imports": 2,
    "total_voters_imported": 38450,
    "total_voters_failed": 125,
    "recent_imports": [
      {
        "id": 25,
        "original_filename": "latest_voter_list.pdf",
        "status": "completed",
        "imported_voters": 1920,
        "created_at": "2025-11-08T10:00:00.000000Z"
      }
    ]
  }
}
```

---

### 6. Reprocess Failed Import

**POST** `/api/pdf-import/reprocess/{id}`

Reprocess a failed or pending import.

**Request:**
```bash
POST /api/pdf-import/reprocess/1
```

**cURL Example:**
```bash
curl -X POST http://your-domain.com/api/pdf-import/reprocess/1
```

**Response:**
```json
{
  "status": "success",
  "message": "Import requeued for processing",
  "data": {
    "id": 1,
    "status": "pending",
    "error_message": null
  }
}
```

---

### 7. Delete Import

**DELETE** `/api/pdf-import/delete/{id}`

Delete an import log and its associated PDF file.

**Request:**
```bash
DELETE /api/pdf-import/delete/1
```

**cURL Example:**
```bash
curl -X DELETE http://your-domain.com/api/pdf-import/delete/1
```

**Response:**
```json
{
  "status": "success",
  "message": "Import deleted successfully",
  "data": null
}
```

---

### 8. Download Original PDF

**GET** `/api/pdf-import/download/{id}`

Download the original uploaded PDF file.

**Request:**
```bash
GET /api/pdf-import/download/1
```

**cURL Example:**
```bash
curl -X GET http://your-domain.com/api/pdf-import/download/1 -O -J
```

**Response:**
- Returns the PDF file as a download

---

## PDF Format Support

The system supports multiple PDF formats commonly used by Election Commission:

### Pattern 1: Serial + EPIC + Name + Gender + Year + Address
```
1 ABC1234567 John Doe M 1990 123 Main St
2 XYZ9876543 Jane Smith F 1985 456 Oak Ave
```

### Pattern 2: EPIC + Name + Age + Gender
```
ABC1234567 John Doe 35 Male
XYZ9876543 Jane Smith 40 Female
```

### Pattern 3: Name - EPIC - Age - Gender
```
John Doe - ABC1234567 - Age: 35 - Male
Jane Smith - XYZ9876543 - Age: 40 - Female
```

### Booth Number Detection
The system automatically detects booth numbers from headers:
```
Booth No: 123
Part No: 124
Booth Number: 125
```

---

## Voter Data Mapping

PDF data is mapped to the `voters` table as follows:

| PDF Field | Database Column | Notes |
|-----------|----------------|-------|
| EPIC No / Voter ID | `voter_id_number` | Unique identifier |
| Name | `name` | Voter's full name |
| Gender (M/F/O) | `gender` | Normalized to M, F, or O |
| Year of Birth | `year_of_birth` | 4-digit year |
| Age | Calculated | Converted to year_of_birth |
| Booth Number | `booth_number` | Extracted from headers |
| Booth ID | `booth_id` | Linked to booths table |

---

## Status Values

| Status | Description |
|--------|-------------|
| `pending` | File uploaded, waiting to be processed |
| `processing` | Currently being processed |
| `completed` | Successfully processed |
| `failed` | Processing failed with errors |

---

## Error Handling

### Validation Errors (400)
```json
{
  "status": "error",
  "message": "Validation failed",
  "errors": {
    "pdf_file": ["The pdf file must be a file of type: pdf"]
  }
}
```

### File Size Error (400)
```json
{
  "status": "error",
  "message": "Validation failed",
  "errors": {
    "pdf_file": ["The pdf file must not be greater than 20480 kilobytes"]
  }
}
```

### Processing Error (500)
```json
{
  "status": "error",
  "message": "Failed to upload PDF: Storage error",
  "data": null
}
```

---

## Background Processing

For production use, imports are queued for background processing using Laravel's queue system.

### Setup Queue Worker

```bash
# Run queue worker
php artisan queue:work

# Run as daemon (production)
php artisan queue:work --daemon

# Process specific queue
php artisan queue:work --queue=default
```

### Monitor Queue

```bash
# Check failed jobs
php artisan queue:failed

# Retry failed job
php artisan queue:retry {job_id}

# Retry all failed jobs
php artisan queue:retry all
```

---

## Database Schema

### pdf_import_logs Table

```sql
CREATE TABLE pdf_import_logs (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    original_filename VARCHAR(255),
    stored_filename VARCHAR(255),
    file_path VARCHAR(255),
    file_size INT,
    status ENUM('pending', 'processing', 'completed', 'failed'),
    total_voters INT DEFAULT 0,
    imported_voters INT DEFAULT 0,
    failed_voters INT DEFAULT 0,
    error_message TEXT,
    import_summary JSON,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    uploaded_by BIGINT,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);
```

---

## Testing Workflow

1. **Analyze PDF First** (Recommended)
   ```bash
   POST /api/pdf-import/analyze
   ```
   - Review detected patterns
   - Verify booth numbers
   - Check sample lines

2. **Upload for Processing**
   ```bash
   POST /api/pdf-import/upload
   # Set process_immediately=true for testing
   ```

3. **Monitor Status**
   ```bash
   GET /api/pdf-import/status/{id}
   ```

4. **Review Statistics**
   ```bash
   GET /api/pdf-import/statistics
   ```

---

## Best Practices

1. **Use Background Processing**: Set `process_immediately=false` for files with 500+ voters
2. **Test with analyze endpoint**: Always analyze PDF structure first
3. **Monitor logs**: Check `storage/logs/laravel.log` for detailed error messages
4. **Handle duplicates**: The system updates existing voters based on `voter_id_number`
5. **Retry failures**: Use reprocess endpoint for failed imports
6. **Clean up old imports**: Regularly delete old import logs to save storage

---

## Troubleshooting

### Import stuck in "processing" status
```bash
# Reprocess the import
POST /api/pdf-import/reprocess/{id}
```

### No voters extracted
- Use analyze endpoint to check PDF structure
- Verify PDF is text-based (not scanned image)
- Check if PDF format matches supported patterns

### High failure rate
- Review `import_summary.errors` in the status response
- Check booth existence in database
- Verify PDF data quality

---

## Performance Notes

- **File Size**: Up to 20MB supported
- **Voters per PDF**: Tested with 2000+ voters
- **Processing Time**: ~2-3 minutes for 2000 voters (background)
- **Batch Size**: 100 voters per transaction
- **Timeout**: 1 hour maximum per job

---

## Support

For custom PDF formats or pattern detection issues, update the regex patterns in:
- `app/Services/VoterPdfImportService.php`
- Methods: `extractVotersFromText()`, `createVoterArray()`, etc.
