In simple words, unstructured data is information which does not fit well in relational model. It is mostly text heavy, but can contain images, videos, emails etc.
File Table, a new feature introduced in SQL 2012, builds on the existing filestream object SQL Server had come up with in 2008 to store unstructured data. Filestream datatype tried to integrate the relational database engine with the windows file system. More information about the filestream object can be found here.
File Table enables non-transactional access to file data. This means the file table data can be accessed through SQL in a transactional way, as well as by the Windows APIs as if it was accessing a file object. File table basically converts SQL tables into folders which can be accessed through Windows explorer. The directory structure and the file attributes are stored into File Table as columns. Files can be bulk loaded, updated as well as managed in T-SQL like any other column. SQL Server also supports backup and restore job for this.
As you can see above, file attributes are stored as columns. stream_id uniquely identifies the file. Data is contained in the filestream column. Most of the columns are read only. The only parameters specified to create an entry in file table is the file name and file directory name.
CREATE TABLE Documentstore AS FileTable
FileTable_Directory = 'Documentstore',
FileTable_Collate_Filename = database_default
Loading files in File Table is as simple as drag and drop in windows explorer.
There are inbuilt SQL triggers which intercept Windows API and maintain data consistency in File Table in case of insertion/updating/deletion.
select * from Documentstore where name ='FileTable.xml'
Working with Directories and files
Here are set of useful queries for working with File Table data.
• To know if a particular row belongs to a file or directory:
o select * from documentstore where is_directory=1
• To get the root level path of the file table:
o select FileTableRootPath('DocumentStore')
• To get the relative path of particular file or directory in a file table:
o select file_stream.GetFileNamespacePath() from documentstore
• To get the specific locator ID of a file by providing the path:
o select GetPathLocator('[fullpath]')
• To understand the full path to a file or directory stored in a FileTable:
o DECLARE @rootpath nvarchar(100);
DECLARE @fullpath nvarchar(1000);
SELECT @rootpath = FileTableRootPath();
SELECT top 1 @fullpath = @rootpath + file_stream.GetFileNamespacePath() FROM documentstore
The resulting hierarchy looks like this:
The directory hierarchy which is created at the time of file table creation is a virtual directory hierarchy. This value is stored in SQL DB with datatype=hierarchyid.
File handles which are not cleaned up can prevent exclusive access required for certain administrative tasks. To identify open files and associated locks, run the below query:
WHERE fcb_id IN
( SELECT request_owner_id FROM sys.dm_tran_locks )
You can also use SQL Profiler to capture Windows File Open and Close operations in Trace Output for files stored in file table.
Once you have identified open file handles in the database or file table, you can clean them up by running the builtin stored proc:
sp_kill_filestream_non_transacted_handles [[ @table_name = ] 'table_name',[[ @handle_id = ] @handle_id]]
In our subsequent blogs, we will see how can we exploit this file table feature of SQL 2012 to do in-depth analysis of unstructured data and gain useful insights.