kiln_ai.adapters.extractors

File extractors for processing different document types.

This package provides a framework for extracting content from files using different extraction methods.

 1"""
 2File extractors for processing different document types.
 3
 4This package provides a framework for extracting content from files
 5using different extraction methods.
 6"""
 7
 8from . import base_extractor, extractor_registry, extractor_runner, litellm_extractor
 9from .base_extractor import ExtractionInput, ExtractionOutput
10
11__all__ = [
12    "ExtractionInput",
13    "ExtractionOutput",
14    "base_extractor",
15    "extractor_registry",
16    "extractor_runner",
17    "litellm_extractor",
18]
class ExtractionInput(pydantic.main.BaseModel):
13class ExtractionInput(BaseModel):
14    path: Path | str = Field(description="The absolute path to the file to extract.")
15    mime_type: str = Field(description="The mime type of the file.")

!!! abstract "Usage Documentation" Models

A base class for creating Pydantic models.

Attributes: __class_vars__: The names of the class variables defined on the model. __private_attributes__: Metadata about the private attributes of the model. __signature__: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
    This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
    __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
    is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.
path: pathlib._local.Path | str
mime_type: str
model_config: ClassVar[pydantic.config.ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ExtractionOutput(pydantic.main.BaseModel):
18class ExtractionOutput(BaseModel):
19    """
20    The output of an extraction. This is the data that will be saved to the data store.
21    """
22
23    is_passthrough: bool = Field(
24        default=False, description="Whether the extractor returned the file as is."
25    )
26    content_format: OutputFormat = Field(
27        description="The format of the extracted data."
28    )
29    content: str = Field(description="The extracted data.")

The output of an extraction. This is the data that will be saved to the data store.

is_passthrough: bool
content: str
model_config: ClassVar[pydantic.config.ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].