kiln_ai.adapters.extractors
File extractors for processing different document types.
This package provides a framework for extracting content from files using different extraction methods.
1""" 2File extractors for processing different document types. 3 4This package provides a framework for extracting content from files 5using different extraction methods. 6""" 7 8from . import base_extractor, extractor_registry, extractor_runner, litellm_extractor 9from .base_extractor import ExtractionInput, ExtractionOutput 10 11__all__ = [ 12 "ExtractionInput", 13 "ExtractionOutput", 14 "base_extractor", 15 "extractor_registry", 16 "extractor_runner", 17 "litellm_extractor", 18]
class
ExtractionInput(pydantic.main.BaseModel):
13class ExtractionInput(BaseModel): 14 path: Path | str = Field(description="The absolute path to the file to extract.") 15 mime_type: str = Field(description="The mime type of the file.")
!!! abstract "Usage Documentation" Models
A base class for creating Pydantic models.
Attributes:
__class_vars__: The names of the class variables defined on the model.
__private_attributes__: Metadata about the private attributes of the model.
__signature__: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.
class
ExtractionOutput(pydantic.main.BaseModel):
18class ExtractionOutput(BaseModel): 19 """ 20 The output of an extraction. This is the data that will be saved to the data store. 21 """ 22 23 is_passthrough: bool = Field( 24 default=False, description="Whether the extractor returned the file as is." 25 ) 26 content_format: OutputFormat = Field( 27 description="The format of the extracted data." 28 ) 29 content: str = Field(description="The extracted data.")
The output of an extraction. This is the data that will be saved to the data store.
content_format: kiln_ai.datamodel.extraction.OutputFormat