API Reference

Upload Files

POST /v1/files/upload

Technical reference for the file upload endpoint.

POST /v1/files/upload

Upload and index one or more files into a dataset.

Authentication

  • API Key or Frontend Token

Content-Type

http
Content-Type: multipart/form-data

Form Fields

FieldTypeRequiredDefaultDescription
datasetIdstringYes*-Dataset to upload files to (*optional with frontend token)
filefileYes-File(s) to upload (can include multiple)
metadatastringNo{}JSON string of per-file metadata
chunkSizenumberNo300Chunk size in tokens
chunkOverlapnumberNo20Overlap between chunks

Supported File Types

Documents (max 100 MB):

  • .pdf - PDF documents
  • .docx - Word documents
  • .xlsx, .xls - Excel spreadsheets
  • .pptx, .ppt - PowerPoint presentations
  • .csv - CSV files
  • .md - Makrdown Files
  • .json - JSON FIles
  • .txt - Plain Text Files

Media (max 2 GB, auto-transcribed):

  • .mp3 - Audio files
  • .wav - Audio files
  • .mp4 - Video files (audio extracted)

Web Scrapper (comming soon)

  • Web - Web URL
  • Sitemap - Sitemap URL

File Size Limits

TypeMax SizeError
Documents100 MBFILE_TOO_LARGE
Media2 GBFILE_TOO_LARGE

Chunking Parameters

chunkSize

  • Type: Positive integer
  • Default: 300 tokens
  • Range: 100-1000 recommended
  • Description: Size of text chunks for embedding

chunkOverlap

  • Type: Non-negative integer
  • Default: 20 tokens
  • Range: 0-100 recommended
  • Description: Overlap between adjacent chunks

Metadata Format

The metadata field must be a JSON string (not object).

Metadata lookup order for each file:

  1. metadata[originalFileName]
  2. metadata[fileId]
  3. metadata[index] (0-based)
json
{ "report.pdf": { "userId": "user_123", "department": "finance", "year": 2025 }, "0": { "priority": "high" } }

Request Examples

cURL - Single File

bash
curl -X POST https://api.easyrag.com/v1/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "datasetId=my-dataset" \ -F "file=@document.pdf"

cURL - With Metadata

bash
curl -X POST https://api.easyrag.com/v1/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "datasetId=my-dataset" \ -F 'metadata={"document.pdf":{"userId":"user_123","department":"legal"}}' \ -F "file=@document.pdf"

cURL - Custom Chunking

bash
curl -X POST https://api.easyrag.com/v1/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "datasetId=my-dataset" \ -F "chunkSize=500" \ -F "chunkOverlap=50" \ -F "file=@document.pdf"

cURL - Multiple Files

bash
curl -X POST https://api.easyrag.com/v1/files/upload \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "datasetId=my-dataset" \ -F "file=@file1.pdf" \ -F "file=@file2.pdf" \ -F "file=@file3.pdf"

JavaScript

javascript
const formData = new FormData(); formData.append('datasetId', 'my-dataset'); formData.append('file', fileInput.files[0]); // Optional metadata const metadata = { [file.name]: { userId: 'user_123', uploadedAt: new Date().toISOString() } }; formData.append('metadata', JSON.stringify(metadata)); // Optional chunking formData.append('chunkSize', '400'); formData.append('chunkOverlap', '40'); const response = await fetch('https://api.easyrag.com/v1/files/upload', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}` }, body: formData }); const result = await response.json();

Python

python
import requests files = {'file': open('document.pdf', 'rb')} data = { 'datasetId': 'my-dataset', 'metadata': '{"document.pdf":{"userId":"user_123"}}' } headers = {'Authorization': f'Bearer {api_key}'} response = requests.post( 'https://api.easyrag.com/v1/files/upload', headers=headers, data=data, files=files ) result = response.json()

Response (200)

json
{ "success": true, "message": "Files processed and indexed successfully!", "files": [ { "customerId": "user_abc123", "datasetId": "my-dataset", "fileId": "f7a3b2c1-4d5e-6f7g-8h9i-0j1k2l3m4n5o", "filePath": "customers/user_abc123/datasets/my-dataset/f7a3b2c1-document.pdf", "originalName": "document.pdf", "mimeType": "application/pdf", "size": 245678, "loaderId": "pdf_loader_xyz", "created": "2024-12-13T10:30:00.000Z", "extension": ".pdf", "transcriptionText": null, "transcriptionSrt": null, "extraMeta": { "userId": "user_123", "department": "legal" } } ], "billed": { "fileCount": 1, "uploadUnits": 1 } }

Response Fields

FieldTypeDescription
successbooleanAlways true on success
messagestringSuccess message
filesarrayArray of uploaded file objects
files[].fileIdstringUnique file identifier
files[].originalNamestringOriginal filename
files[].datasetIdstringDataset containing the file
files[].customerIdstringCustomer who owns the file
files[].filePathstringStorage path
files[].mimeTypestringFile MIME type
files[].sizenumberFile size in bytes
files[].loaderIdstringEmbedJS loader ID
files[].createdstringISO 8601 timestamp
files[].extensionstringFile extension
files[].transcriptionTextstring|nullTranscription text (media only)
files[].transcriptionSrtarray|nullSRT subtitles (media only)
files[].extraMetaobject|nullCustom metadata
billed.fileCountnumberNumber of files uploaded
billed.uploadUnitsnumberCredits charged (file count × 10)

Transcription Response

For audio/video files, transcription fields are populated:

json
{ "files": [ { "fileId": "a1b2c3d4", "originalName": "podcast.mp3", "extension": ".mp3", "transcriptionText": "Welcome to episode 47...", "transcriptionSrt": [ { "id": "1", "startTime": "00:00:00,000", "endTime": "00:00:03,500", "text": "Welcome to episode 47" } ] } ] }

Error Responses

400 Bad Request

Missing datasetId

json
{ "error": "datasetId is required via token, body or form" }

No files provided

json
{ "error": "At least one file is required" }

Invalid chunk size

json
{ "error": "chunkSize must be a positive integer" }

File too large

json
{ "error": "FILE_TOO_LARGE", "message": "File exceeds maximum size of 100MB" }

Unsupported format

json
{ "error": "Unsupported file format: .txt" }

Invalid metadata JSON

json
{ "error": "metadata must be valid JSON string" }

401 Unauthorized

json
{ "error": "Missing API key or token" }

402 Payment Required

json
{ "error": "INSUFFICIENT_CREDITS", "message": "You are out of credits. Please top up to continue.", "details": { "required": 10, "available": 5 } }

403 Forbidden

Dataset mismatch with token

json
{ "error": "datasetId mismatch between token and request" }

413 Payload Too Large

json
{ "error": "Request entity too large" }

429 Too Many Requests

json
{ "error": "RATE_LIMIT_EXCEEDED", "message": "Too many requests. Please try again later.", "retryAfter": 60 }

Processing Details

Document Processing

  1. File uploaded to Cloud Storage
  2. Content extracted based on type:
    • PDF: Text + tables via pdf-parse
    • DOCX: Text via mammoth
    • XLSX: All sheets + cells via xlsx
    • PPTX: Slides + notes via pptx
    • CSV: All rows via csv-parser
  3. Text split into chunks (configurable size/overlap)
  4. Each chunk embedded using OpenAI text-embedding-3-small
  5. Embeddings stored in Qdrant with metadata
  6. File metadata saved to Realtime Database

Media Processing

  1. File uploaded to Cloud Storage
  2. Sent to AssemblyAI for transcription
  3. Transcription text + SRT generated
  4. Text split into chunks (configurable)
  5. Chunks embedded and indexed
  6. Transcription saved with file metadata

Note: Media processing adds ~1-2 minutes per hour of audio.

Billing

  • Cost: 1 credit per file (10 units)
  • Multiple files: Each file costs 1 credit
  • Failed uploads: Not charged
  • Partial success: Only successful files are charged

Rate Limits

  • Upload limit: 100 files/minute per customer
  • Size limits: Enforced per request
  • Concurrent uploads: No specific limit

Best Practices

1. Validate Files Client-Side

javascript
function validateFile(file) { const maxSize = 100 * 1024 * 1024; // 100MB const allowed = ['pdf', 'docx', 'xlsx', 'pptx', 'csv', 'mp3', 'wav', 'mp4']; const ext = file.name.split('.').pop().toLowerCase(); if (file.size > maxSize) { return { valid: false, error: 'File too large' }; } if (!allowed.includes(ext)) { return { valid: false, error: 'Unsupported file type' }; } return { valid: true }; }

2. Use Appropriate Chunk Sizes

javascript
// Large chunks for documents with long sections formData.append('chunkSize', '500'); formData.append('chunkOverlap', '50'); // Small chunks for precise matching formData.append('chunkSize', '200'); formData.append('chunkOverlap', '20');

3. Add Useful Metadata

javascript
const metadata = { [file.name]: { userId: currentUserId, uploadedAt: new Date().toISOString(), department: userDepartment, fileType: file.type, sizeBytes: file.size } };

4. Handle Progress

javascript
const xhr = new XMLHttpRequest(); xhr.upload.onprogress = (e) => { if (e.lengthComputable) { const percent = (e.loaded / e.total) * 100; updateProgress(percent); } }; xhr.open('POST', 'https://api.easyrag.com/v1/files/upload'); xhr.setRequestHeader('Authorization', `Bearer ${apiKey}`); xhr.send(formData);

5. Retry Failed Uploads

javascript
async function uploadWithRetry(formData, maxRetries = 3) { for (let i = 0; i < maxRetries; i++) { try { const response = await fetch('https://api.easyrag.com/v1/files/upload', { method: 'POST', headers: { 'Authorization': `Bearer ${apiKey}` }, body: formData }); if (response.ok) { return await response.json(); } if (response.status === 500 && i < maxRetries - 1) { await new Promise(r => setTimeout(r, 1000 * (i + 1))); continue; } throw new Error(`Upload failed: ${response.status}`); } catch (error) { if (i === maxRetries - 1) throw error; } } }

Notes

  • Files are processed asynchronously after upload
  • Upload endpoint returns immediately after storing file
  • Indexing happens in background (typically < 30 seconds)
  • Multiple files in one request are processed concurrently
  • Metadata is immutable after upload
  • To update metadata, delete and re-upload file
  • Frontend tokens can only upload to their authorized dataset
  • datasetId is created automatically if it doesn't exist

Related Endpoints

  • GET /v1/files - List uploaded files
  • GET /v1/files/:fileId - Get file details
  • DELETE /v1/files/:fileId - Delete file