API Reference

Upload Files

POST /v1/files/upload

Technical reference for the file upload endpoint.

POST `/v1/files/upload`

Upload and index one or more files into a dataset.

Authentication

API Key or Frontend Token

Content-Type

http
Content-Type: multipart/form-data

Form Fields

Field	Type	Required	Default	Description
`datasetId`	string	Yes*	-	Dataset to upload files to (*optional with frontend token)
`file`	file	Yes	-	File(s) to upload (can include multiple)
`metadata`	string	No	`{}`	JSON string of per-file metadata
`chunkSize`	number	No	300	Chunk size in tokens
`chunkOverlap`	number	No	20	Overlap between chunks

Supported File Types

Documents (max 100 MB):

.pdf - PDF documents
.docx - Word documents
.xlsx, .xls - Excel spreadsheets
.pptx, .ppt - PowerPoint presentations
.csv - CSV files
.md - Makrdown Files
.json - JSON FIles
.txt - Plain Text Files

Media (max 2 GB, auto-transcribed):

.mp3 - Audio files
.wav - Audio files
.mp4 - Video files (audio extracted)

Web Scrapper (comming soon)

Web - Web URL
Sitemap - Sitemap URL

File Size Limits

Type	Max Size	Error
Documents	100 MB	`FILE_TOO_LARGE`
Media	2 GB	`FILE_TOO_LARGE`

Chunking Parameters

chunkSize

Type: Positive integer
Default: 300 tokens
Range: 100-1000 recommended
Description: Size of text chunks for embedding

chunkOverlap

Type: Non-negative integer
Default: 20 tokens
Range: 0-100 recommended
Description: Overlap between adjacent chunks

Metadata Format

The metadata field must be a JSON string (not object).

Metadata lookup order for each file:

metadata[originalFileName]
metadata[fileId]
metadata[index] (0-based)

json
{
  "report.pdf": {
    "userId": "user_123",
    "department": "finance",
    "year": 2025
  },
  "0": {
    "priority": "high"
  }
}

Request Examples

cURL - Single File

bash
curl -X POST https://api.easyrag.com/v1/files/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "datasetId=my-dataset" \
  -F "file=@document.pdf"

cURL - With Metadata

bash
curl -X POST https://api.easyrag.com/v1/files/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "datasetId=my-dataset" \
  -F 'metadata={"document.pdf":{"userId":"user_123","department":"legal"}}' \
  -F "file=@document.pdf"

cURL - Custom Chunking

bash
curl -X POST https://api.easyrag.com/v1/files/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "datasetId=my-dataset" \
  -F "chunkSize=500" \
  -F "chunkOverlap=50" \
  -F "file=@document.pdf"

cURL - Multiple Files

bash
curl -X POST https://api.easyrag.com/v1/files/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "datasetId=my-dataset" \
  -F "file=@file1.pdf" \
  -F "file=@file2.pdf" \
  -F "file=@file3.pdf"

JavaScript

javascript
const formData = new FormData();
formData.append('datasetId', 'my-dataset');
formData.append('file', fileInput.files[0]);

// Optional metadata
const metadata = {
  [file.name]: {
    userId: 'user_123',
    uploadedAt: new Date().toISOString()
  }
};
formData.append('metadata', JSON.stringify(metadata));

// Optional chunking
formData.append('chunkSize', '400');
formData.append('chunkOverlap', '40');

const response = await fetch('https://api.easyrag.com/v1/files/upload', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`
  },
  body: formData
});

const result = await response.json();

Python

python
import requests

files = {'file': open('document.pdf', 'rb')}
data = {
    'datasetId': 'my-dataset',
    'metadata': '{"document.pdf":{"userId":"user_123"}}'
}
headers = {'Authorization': f'Bearer {api_key}'}

response = requests.post(
    'https://api.easyrag.com/v1/files/upload',
    headers=headers,
    data=data,
    files=files
)

result = response.json()

Response (200)

json
{
  "success": true,
  "message": "Files processed and indexed successfully!",
  "files": [
    {
      "customerId": "user_abc123",
      "datasetId": "my-dataset",
      "fileId": "f7a3b2c1-4d5e-6f7g-8h9i-0j1k2l3m4n5o",
      "filePath": "customers/user_abc123/datasets/my-dataset/f7a3b2c1-document.pdf",
      "originalName": "document.pdf",
      "mimeType": "application/pdf",
      "size": 245678,
      "loaderId": "pdf_loader_xyz",
      "created": "2024-12-13T10:30:00.000Z",
      "extension": ".pdf",
      "transcriptionText": null,
      "transcriptionSrt": null,
      "extraMeta": {
        "userId": "user_123",
        "department": "legal"
      }
    }
  ],
  "billed": {
    "fileCount": 1,
    "uploadUnits": 1
  }
}

Response Fields

Field	Type	Description
`success`	boolean	Always `true` on success
`message`	string	Success message
`files`	array	Array of uploaded file objects
`files[].fileId`	string	Unique file identifier
`files[].originalName`	string	Original filename
`files[].datasetId`	string	Dataset containing the file
`files[].customerId`	string	Customer who owns the file
`files[].filePath`	string	Storage path
`files[].mimeType`	string	File MIME type
`files[].size`	number	File size in bytes
`files[].loaderId`	string	EmbedJS loader ID
`files[].created`	string	ISO 8601 timestamp
`files[].extension`	string	File extension
`files[].transcriptionText`	string\|null	Transcription text (media only)
`files[].transcriptionSrt`	array\|null	SRT subtitles (media only)
`files[].extraMeta`	object\|null	Custom metadata
`billed.fileCount`	number	Number of files uploaded
`billed.uploadUnits`	number	Credits charged (file count × 10)

Transcription Response

For audio/video files, transcription fields are populated:

json
{
  "files": [
    {
      "fileId": "a1b2c3d4",
      "originalName": "podcast.mp3",
      "extension": ".mp3",
      "transcriptionText": "Welcome to episode 47...",
      "transcriptionSrt": [
        {
          "id": "1",
          "startTime": "00:00:00,000",
          "endTime": "00:00:03,500",
          "text": "Welcome to episode 47"
        }
      ]
    }
  ]
}

Error Responses

400 Bad Request

Missing datasetId

json
{
  "error": "datasetId is required via token, body or form"
}

No files provided

json
{
  "error": "At least one file is required"
}

Invalid chunk size

json
{
  "error": "chunkSize must be a positive integer"
}

File too large

json
{
  "error": "FILE_TOO_LARGE",
  "message": "File exceeds maximum size of 100MB"
}

Unsupported format

json
{
  "error": "Unsupported file format: .txt"
}

Invalid metadata JSON

json
{
  "error": "metadata must be valid JSON string"
}

401 Unauthorized

json
{
  "error": "Missing API key or token"
}

402 Payment Required

json
{
  "error": "INSUFFICIENT_CREDITS",
  "message": "You are out of credits. Please top up to continue.",
  "details": {
    "required": 10,
    "available": 5
  }
}

403 Forbidden

Dataset mismatch with token

json
{
  "error": "datasetId mismatch between token and request"
}

413 Payload Too Large

json
{
  "error": "Request entity too large"
}

429 Too Many Requests

json
{
  "error": "RATE_LIMIT_EXCEEDED",
  "message": "Too many requests. Please try again later.",
  "retryAfter": 60
}

Processing Details

Document Processing

File uploaded to Cloud Storage
Content extracted based on type:
- PDF: Text + tables via pdf-parse
- DOCX: Text via mammoth
- XLSX: All sheets + cells via xlsx
- PPTX: Slides + notes via pptx
- CSV: All rows via csv-parser
Text split into chunks (configurable size/overlap)
Each chunk embedded using OpenAI text-embedding-3-small
Embeddings stored in Qdrant with metadata
File metadata saved to Realtime Database

Media Processing

File uploaded to Cloud Storage
Sent to AssemblyAI for transcription
Transcription text + SRT generated
Text split into chunks (configurable)
Chunks embedded and indexed
Transcription saved with file metadata

Note: Media processing adds ~1-2 minutes per hour of audio.

Billing

Cost: 1 credit per file (10 units)
Multiple files: Each file costs 1 credit
Failed uploads: Not charged
Partial success: Only successful files are charged

Rate Limits

Upload limit: 100 files/minute per customer
Size limits: Enforced per request
Concurrent uploads: No specific limit

Best Practices

1. Validate Files Client-Side

javascript
function validateFile(file) {
  const maxSize = 100 * 1024 * 1024; // 100MB
  const allowed = ['pdf', 'docx', 'xlsx', 'pptx', 'csv', 'mp3', 'wav', 'mp4'];
  const ext = file.name.split('.').pop().toLowerCase();

  if (file.size > maxSize) {
    return { valid: false, error: 'File too large' };
  }

  if (!allowed.includes(ext)) {
    return { valid: false, error: 'Unsupported file type' };
  }

  return { valid: true };
}

2. Use Appropriate Chunk Sizes

javascript
// Large chunks for documents with long sections
formData.append('chunkSize', '500');
formData.append('chunkOverlap', '50');

// Small chunks for precise matching
formData.append('chunkSize', '200');
formData.append('chunkOverlap', '20');

3. Add Useful Metadata

javascript
const metadata = {
  [file.name]: {
    userId: currentUserId,
    uploadedAt: new Date().toISOString(),
    department: userDepartment,
    fileType: file.type,
    sizeBytes: file.size
  }
};

4. Handle Progress

javascript
const xhr = new XMLHttpRequest();

xhr.upload.onprogress = (e) => {
  if (e.lengthComputable) {
    const percent = (e.loaded / e.total) * 100;
    updateProgress(percent);
  }
};

xhr.open('POST', 'https://api.easyrag.com/v1/files/upload');
xhr.setRequestHeader('Authorization', `Bearer ${apiKey}`);
xhr.send(formData);

5. Retry Failed Uploads

javascript
async function uploadWithRetry(formData, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      const response = await fetch('https://api.easyrag.com/v1/files/upload', {
        method: 'POST',
        headers: { 'Authorization': `Bearer ${apiKey}` },
        body: formData
      });

      if (response.ok) {
        return await response.json();
      }

      if (response.status === 500 && i < maxRetries - 1) {
        await new Promise(r => setTimeout(r, 1000 * (i + 1)));
        continue;
      }

      throw new Error(`Upload failed: ${response.status}`);
    } catch (error) {
      if (i === maxRetries - 1) throw error;
    }
  }
}

Notes

Files are processed asynchronously after upload
Upload endpoint returns immediately after storing file
Indexing happens in background (typically < 30 seconds)
Multiple files in one request are processed concurrently
Metadata is immutable after upload
To update metadata, delete and re-upload file
Frontend tokens can only upload to their authorized dataset
datasetId is created automatically if it doesn't exist

Related Endpoints

GET /v1/files - List uploaded files
GET /v1/files/:fileId - Get file details
DELETE /v1/files/:fileId - Delete file