| build_chat_prompt | Build chat prompt from conversation history |
| edge_ask | Ask a question using retrieval-augmented generation |
| edge_benchmark | Performance benchmarking for model inference |
| edge_cache_info | Cache size information |
| edge_chat_completion | Generate a chat completion using the model's native template |
| edge_chat_stream | Interactive chat session with streaming responses |
| edge_classify | Classify text into predefined categories |
| edge_clean_cache | Clean up cache directory and manage storage |
| edge_completion | Generate text completion using loaded model |
| edge_cuda_info | Check whether a CUDA backend is installed and active |
| edge_download_model | Download a GGUF model from Hugging Face |
| edge_download_url | Download a model from a direct URL |
| edge_embeddings | Extract text embeddings from a model |
| edge_extract | Extract structured data from text |
| edge_extract_batch | Extract structured data from multiple texts |
| edge_find_gguf_models | Find and prepare GGUF models for use with edgemodelr |
| edge_find_ollama_models | Find and load Ollama models |
| edge_free_model | Free model context and release memory |
| edge_grammar_completion | Generate text constrained by a GBNF grammar |
| edge_index_documents | Build an embedding index from text documents |
| edge_install_cuda | Install the CUDA backend for GPU-accelerated inference |
| edge_install_cuda_toolkit | Install CUDA runtime libraries required for GPU inference |
| edge_json_grammar | Generate a GBNF grammar for JSON output from a schema |
| edge_list_models | List popular pre-configured models |
| edge_load_model | Load a local GGUF model for inference |
| edge_load_ollama_model | Load an Ollama model by partial SHA-256 hash |
| edge_map | Apply a prompt template to a vector of texts |
| edge_model_n_embd | Get the embedding dimension of a loaded model |
| edge_quick_setup | Quick setup for a popular model |
| edge_reload_cuda | Activate an installed CUDA backend without restarting R |
| edge_search | Search an embedding index for relevant chunks |
| edge_serve | Serve a model as a local OpenAI-compatible API |
| edge_set_verbose | Control llama.cpp logging verbosity |
| edge_simd_info | Query SIMD optimization status |
| edge_similarity | Compute cosine similarity between two embedding vectors |
| edge_similarity_matrix | Compute a similarity matrix for a set of embeddings |
| edge_small_model_config | Get optimized configuration for small language models |
| edge_stream_completion | Stream text completion with real-time token generation |
| is_valid_model | Check if model context is valid |
| test_ollama_model_compatibility | Test if an Ollama model blob can be used with edgemodelr |