Skip to content

A data backup solution written in Python that is tailored for general end-users, power users, developers, and clients who have specialized needs or for data that requires low-level security or ultra high-level security.

License

mark3982/Neophytos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

neophytos-logo

Dependancies

The dependancies are very minimal and this is intentional. I try to use only what is included in the Python standard library. Also, for the server I try to only use what is the Golang standard library. If try to implement everything in pure Python, but when performance suffers I will include support for native code while keeping the pure Python implementation around if possible.

  • server requires Golang to build or compile with - i am looking at providing prebuilt binaries
  • client requires Python 3.x (prefer latest version especially on Windows)
  • client GUI requires PyQt4 (GUI is work in progress)

I choose Go because it provided the needed performance. I did not know much about Rust at the time, but I could in the future port to Rust as I research the language more. There are a lot of design decisions to make regarding either one.

I prefer to use Python as it provides rapid application development and is easy to use with an assortment of libraries, but when it comes to performance it is not always so great.

I use PyQt4 for the GUI because it is a stable, modern, mature, cross-platform, simple, and powerful.

Current State

The [X] means it has been implemented, while [ ] means still in development.

  • secure communication from client to server
  • multiple backup targets per account
  • multiple sub-accounts with different permissions if desired (read/write)
  • delta uploading/downloading (only uploads/downloads changed portion of file)
  • (50% implemented) file stashing (supports versions and recovery of deleted files if desired)
  • open source client and server
  • cross-platform client (Linux, Windows, ARM, or anywhere Python 3.x will run)
  • cross-platform server (Anywhere golang will run, or can target.)
  • files can be encrypted client side (in memory) before being sent (sensitive data protected on server)
  • client command line interface (95% complete; very much usable)
  • filter system (disabled at the moment but 90% of the support is there)
  • broken; GUI frontend (client changes have broken the once partially working GUI)
  • delta uploading (patching) - only changed parts of file are uploaded

Client Tutorial

To run the client use:

python3 backup.py

Some of the options are:

--lpath=<local path>                              local machine absolute path
--rpath=<remote path>                             remote server relative path 
--password=<authorization code>                   specifies the authorization code
--push                                            will push files to the server
--pull                                            will pull files from the server
--sync-rdel                                       will synchronize remote for locally deleted files
--host                                (optional)  the hostname or address (default to kmcg3413.net)
--port                                (optional)  the port (default 4322)
--cipher                              (optional)  SSL cipher string (RC4, RSA, DSA, ..)
--filter-file                         (optional)  file with one filter entry per line
--make-sample-filter-file                         produces example filter file as filter.example
--authcode=<authorization code>                   SAME AS --password
--no-ssl                                          uses non-SSL socket (could be buggy)
--debug                                           enables debug output
--no-sformat                                      disables using stash format on server
--efilter-file                        (optional)  encryption filter file path
--def-crypt                           (optional)  default encryption/decryption to use

For example to push all files in a directory to the server under the name peach.

python3 backup.py --push --lpath=/mnt/kmcguire --rpath=peach --host=myserver --password=e3xample

To pull files from server under the name peach.

python3 backup.py --pull --lpath=/temp --rpath=peach --host=myserver --password=e3xample

To pull all targets (including peach and any others).

python3 backup.py --pull --lpath=/mnt/kmcguire/temp --host=myserver --password=e3xample

If you are pushing to something where the files will be used directly which is more like rsync then you should also add the --no-sformat option to any operation. This will ensure that the client does not try to interpret the files like they are in a stash format. I think a --pull operation may work fine with out it but the correct way is to use the --no-sformat option.

python3 backup.py --push --lpath=/mnt/kmcguire --rpath=rsynclike --no-sformat

The files placed on the server will be named exactly like they are locally using the --no-sformat option. If you omit this you will end up with a revision (or stash identifier) prefixed to every file and directory. This is useful when your just trying to synchronize two paths between two machines. You can actually do all the synchronization from one machine with.

python3 backup.py --pull --lpath=/mnt/kmcguire --rpath=rsynclike --no-sformat
python3 backup.py --push --lpath=/mnt/kmcguire --rpath=rsynclike --no-sformat

The only problem is this will delete nothing locally or remotely which can be desired, but due to technical limitations it is not easy to infer if a file has been deleted from the server because that would require storing this information. And, for how long should this be stored? This type of a situation would likely better lend it's self other solutions that I have in mind but need more time to actually put together.

Filters

For those less inclined to delve into the source code and implement your own filter system there is one provided that is fairly powerful in design, and easy to understand. First let us take a look at a common filter.

dir       reject      ^test$
any       accept      .*

This would be located in a file of your choice in naming. For this example let us pretend it was saved under test-filter located in the working directory of the client. You load this file you would issue the option --filter-file=test-filter. Notice the filename comes after the equal sign. You can also specify relative and absolute paths such as /home/dave/myfilters/onlysource or ../myfilters/nomedia.

Each line is a filter rule. You can insert tabs or spaces between the elements of a filter rule. The filter rule has three elements. The first is the type of rule, the second is what to do if it matches, and the third is the regular expression (pattern) to do the matching with. The tabs and spaces do not have to be the same on each rule however it may look nicer and be easier to read if you do maintain equal spacing.

That filter will accept all files and directories except any directory named test. To do this the filter is executed from top to bottom for each file and directory. It first evaluates dir reject ^test$. If this rule matches it rejects it. But, for it to even evaluate it has to be a dir. If it was a file it woud just skip down to the next line. So essentially each rule is evaluated and if it matches it either accepts or rejects. You can make rules for dir, file, path, or any. When a file or dires orctory is tested only the top level portion of the path is used. For example for /home/dave/test/apple.png only apple.png is checked _except if you use the path type. The path type checks the entire path.

What happens when it reaches the end of the filter file and has no matches? Well, by default it rejects. You can have an unlimited amount of rules. _I hope to one day add more functionality such as forward jumps so you can create individual sections for specific things with some basic flow control.

Server Tutorial

I rewrote the server in Go (golang) to have lower CPU usage, but I have not fully added back in support for the quota! And, what code has been added has not been well tested!

First we need to generate a certificate and an RSA private/public key pair. The certificate is used with SSL and the private/public RSA key pair are used by both SSL and other supported communication link encryptions.

openssl req -new -x509 -days 365 -nodes -out cert.pem -keyout cert.pem --newkey rsa:8192

This will generate a 8192-bit key pair which should be decently secure. You should be able to generate and use any size.

This file should be named cert.pem in the server's working directory. The server can now be started if desired, or you can wait.

The next step is creating an account. There should be a sub-directory created called accounts in the server's working directory. Inside create a file named exactly hdk392Ej. This will be the authorization code used to access this account. The accounts directory should be kept SECRET. Anyone with access to it will know the authorization code for any account. I suspect if they have access to this directory then they likely have access to any data for the accounts. We will discuss data in a moment.

Once the file hdk392Ej is created open it with an editor. Place these contents inside:

SpaceQuota:   100000000
SpaceUsed:    0
SpacePerFile: 4096
SpacePerDir:  4096
DiskPath:     /home/kmcguire/data/ok493L3Dx92Xs029W

The disk-used represents the bytes used by the account. The disk-used-perfile represents the bytes to subtract for each file created and tries to sort of serve to account for an account with lots of small or empty files.

The server has now been configured and setup for the account hdk392Ej. The hdk392Ej serves as the username and password combination. I recommend using something much longer. If you are intent on using something resembling more of a username and password combination then you could do something along the lines of username.password and replace each with the respective username and password.

I would like to add support in later for a more securish (at least looking) account database. At the moment the server and client are still in development and this is not by any means the final form but simply represents a roughed in system.

To start the server you will need to issue the following commands for Golang.

GOPATH="/home/kmcguire/project/goserver"
export GOPATH
go run ./src/main.go

Where GOPATH specifies the directory goserver included when you download the project repository. You can also build an executable using go, but I will leave that as an excercise for you for now!

Technical Limitations Of Client And Server

When I refer to meta-data I am refering to the ability to tag a file with an variable sized byte header.

The server at this time is expected to support a filename of at least a length of 255 bytes. If you use UTF-16 that means you can have a filename length of 128 bytes. If you use UTF-8 it can vary. That 255 byte limit is the maximum supported filename length of the EXT4 file system on linux. The server does not enforce this limitation, but instead the OS and file system that the server is running on and manipulating files on does. If your OS has no limitation then it is solely dependant on the file system you are using such as NTFS, NFS, EXT, and any other. And, if they have no limit then the server has no limit.

The server technically places no maximum length on filenames or paths but it does place a limitation on the maximum message size at around 4MB currently. This means if your filename is 3MB in length then you only have 1MB left for data and that might slighlty impact uploading speed. Also, if your filename is 4MB in length then there would be no room for anything else.

The stashing feature is implemented using a specialy named sub-directory in your root directory which is able to store files that have been deleted in the event you need to recover them. I abandoned my original method for stashing files so I do not have any details on the new method as this time.

Let us talk a moment about meta data and how it works. The meta data is really just data except it describes one or more things about a file. The meta data support is partially implemented by the server but only for the dir list command. Other than the directory list command the server is completely unaware of meta data even existing. It sees each file as a sequence of bytes that can be read and writen to. Support for meta data for the dir list command was added to increase performance. You can implement meta data support with out using the dir list command. When the client issues a dir list command it tells the server what directory to list and it tells it how many bytes to read from the beginning of each file. This is as far as the server is concerned, and really it can be used for non meta data purposes. How this is used to up to the client software. The standard client uses it as meta data. At this time a directory has no meta data, but maybe in the future it may. This support is entirely up to the client but could be assisted by the server however I would like to keep the complexity on the client.

One example of meta-data is support for compression and client side encryption. Your communications with the server is protected by SSL/TLS and how ever that is configured. However, the client may support the ability to encrypt files locally outside of SSL/TLS in order to protect their integrity on the server. A good example is to prevent a system administrator from prying into your files which might contain sensitive data. If you encrypt the files with the client then they can be stored encrypted on the server. So what does this have to do with meta data? Well, meta data can help by allowing the client to tag this file as encrypted and even the algorithm. This can make decrypting the files an automated process when they are downloaded/pulled from the server. The client can inspect the meta data bytes and determine if the file is encrypted (or compressed) and reverse this process is supplied with the proper password.

Also worth noting is that it is likely a good idea for your client to by default support at least one byte of meta data in order to signify that further meta data exists. The standard client uses the first byte of the meta data to specify its version or type. This allows backwards compatibility if changes are introduced and a path for upgrading existing data if needed.

  • VERSION1 BYTE 0xAA
  • 0xAD through 0xAF are RESERVED

So meta data is really helpful for the client where client means both software and user.

Client Side Isolated Encryption

I call it client side isolated because the only place encryption or decryption ever takes place is on the client. I hope to have a decent plugin system in place to provide encryption. The standard client also supports mixing of encryption using filters. You can specify encryption using one of these two ways or both where one will override the other.

This really is a very safe and powerful way to encrypt your data. It removes the server side being comprimised from being a problem and this is exactly what you want. It also prevents the server administrator from copying your data and making it public since they do not have the encryption key to recover the actual data.

The first way is using the command line option --def-crypt=<algorithm>,<parameters>. This causes the standard client to look for the encryption plugin specified by algorithm. Then pass the parameters to it that are specified and encrypt any file using this.

The second way can be used in conjunction with the first way, or used alone. This way uses a filter to select the files to apply the specified encryption on. You can use this to apply more expensive encryption to more sensitive data, and at the same time apply lesser encryption or no encryption at all the some files. The filter works the same way as the filter inclusion filter. If a file match the filter that encryption with its optionions is applied to the file. An example file looks like this:

#apple,scrypt,0,0.125,5.0,mypassword
file        accept      .*\.doc
file        accept      .*\.jpg
file        accept      .*\.png
file        accept      .*\.bmp
file        accept      .*\.avi
file        accept      .*\.mov
file        accept      .*\.mpg
#grape,scrypt,0,0.5,30.0,file:/home/dave/script.password
path        accept      /home/dave/work
#square,xor,file:/home/dave/xordata
path        accept      /home/dave/pictures

The scrypt plugin handles all the options, and in the example above it supports providing the password directly in the option or you can provide a file that contains the password. The password can be any sequence of bytes except , if provided in the filter file or it can be any sequence of bytes if provided as a file.

You should also notice the words apple, grape, and square. These tag the file to help reassociate it with the options needed to decrypt it. The options can be considered part of the password and therefore are not stored with the file unless specified to do so. In that case you would not want to use non-file passwords because they would be stored along with the file. It is up to you as to how much information you include in the tag. For example if you are using scrypt,mypassword,0,0.125,5.0 and you name your tag scrypt-apple you have given the attacker that information. Indeed it may help you remember or determine how to decrypt the file but it comes at a potential cost of security. I am not a cryptography expert so it is hard for me to say just how this could turn out, but you should be aware of how the encrypted file is tagged. This system makes your encryption filter very important as it is the link to decrypting your files once3 encrypted unless you can rebuild it.

Also of important is the order of the filters in the file. If the first filter tagged with apple accepts a file then it exits and apple is applied. The grape nor square tagged encryption filters will not run. So the order in which you place the filters is very important in order to ensure that the correct encryption is applied to the file.

To review - when using client side encryption the server never sees the encryption key. The entire process happens on the client. The files are stored in their encrypted state. The encryption filter includes one or more filters that select the files to apply the specified encryption on. By being able to apply different encryption you can use slower but highly secure algorithms on sensitive data and faster but less secure algorithms on files that are not very sensitive.

Encryption Plugins

There are some standard plugins packaged by default. Some may contain any needed binaries, and others may require you to build the binary for them to be usable. At the moment all the plugins use a binary backing because Python is just a tad too slow for heavy integer crunching in loops.

Most plugins should be very portable in their binary form. You may find a pre-built binary for each plugin that needs one. Some platforms may not have a binary and in that case you will need to build one. I recommend looking at /lib/pluginman.py to determine the proper name for the library and then building it from the source in each plugin sub-directory.

NULL

This plugin is not generaly used directly. It essentially applies no transformation to the data.

XOR

A very simple plugin mainly to be used as a demonstration plugin.

AESCTRMULTI

This plugin uses AES-256-ECB and AES-256-CTR to encrypt individual files. It requires either a master key file that contains at least 32 bytes of data (remaining bytes ignored), or a normal password supplied. If you supply a normal password the SCRYPT KDF is used to generate 32 bytes and that is used as the key for AES.

Each file has a random 256-bit key generated and that key is used in AES-256-CTR mode to encrypt the entire file with a known nonce. Then this random key is encrypted with the master key and placed at the beginning of the file.

What encryption plugin should I use?

You should either use no encryption or the AESCTRMULTI.

I need something that is not likely to be broken in my life time...

It is very hard to estimate what kind of hardware we will have in 20 years, and if a cryptographic weakness in AES will be discovered. However, at this moment the usage of AESCTRMULTI provides very strong encryption and is not likely to be broken by anyone.

Instead, you should be more worried about your local system being comprised by an attacker.

About

A data backup solution written in Python that is tailored for general end-users, power users, developers, and clients who have specialized needs or for data that requires low-level security or ultra high-level security.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published