aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 65f6839c7f14e1ae1aad7bb0247f10d90c4974e5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
# Sproxy2

HTTP proxy for authenticating users via OAuth2.


## Motivation

This is overhaul of original [Sproxy](https://hackage.haskell.org/package/sproxy).
See [ChangeLog.md](./ChangeLog.md) for the differences.

Why use a proxy for doing OAuth2? Isn't that up to the application?

 * sproxy is secure by default.  No requests make it to
   the web server if they haven't been explicitly whitelisted.
 * sproxy is independent.  Any web application written in
   any language can use it.

## Use cases

 * Existing web applications with concept of roles. For example,
   [Mediawiki](https://www.mediawiki.org), [Jenkins](https://jenkins.io),
   [Icinga Web 2](https://www.icinga.org/products/icinga-web-2/). In
   this case you configure Sproxy to allow unrestricted access
   to the application for some groups defined by Sproxy. These
   groups are mapped to the application roles.  There is a [plugin for
   Jenkins](https://wiki.jenkins-ci.org/display/JENKINS/Reverse+Proxy+Auth+Plugin)
   which can be used for this. Mediawiki and Icinga Web 2 were also
   successfully deployed in this way, though it required changes to their
   source code.

 * New web applications designed to work specifically behind Sproxy. In this case
   you define Sproxy rules to control access to the
   application's API.  It would likely be [a single-page
   application](https://en.wikipedia.org/wiki/Single-page_application).
   Examples are [MyWatch](https://hackage.haskell.org/package/mywatch) and
   [Juan de la Cosa](https://hackage.haskell.org/package/juandelacosa).

 * Replace HTTP Basic authentication.


How it works
============

When an HTTP client makes a request, Sproxy checks for a *session cookie*.
If it doesn't exist (or it's invalid, expired), it responses with [HTTP
status 511](https://tools.ietf.org/html/rfc6585) with the page, where the
user can choose an [OAuth2](https://tools.ietf.org/html/rfc6749) provider to
authenticate with.  Finally, we store the the email address in a session
cookie: signed with a hash to prevent tampering, set for HTTP only (to prevent
malicious JavaScript from reading it), and set it for secure (since we don't
want it traveling over plaintext HTTP connections).

From that point on, when sproxy detects a valid session cookie it extracts the
email, checks it against the access rules, and relays the request to the
back-end server (if allowed).


Permissions system
------------------
Permissions are stored in internal SQLite3 database and imported
from data sources, which can be a PostgreSQL database or a file.  See
[sproxy.sql](./sproxy.sql) and [datafile.example.yml](./datafile.example.yml)
for details.

Do note that Sproxy2 fetches only `group_member`, `group_privilege`
and `privilege_rule` tables, because only these tables are used for
authorization. The other tables in PostgreSQL schema serve for data
integrity. Data integrity of the data file is not verfied, though import
may fail due to primary key restrictions.

Only one data source can be used. The data in internal database, if any,
is fully overwritten by the data from a data source. If no data source is
specified, the data in internal database remains unchanged, even between
restarts.  Broken data source is _not_ fatal. Sproxy will keep using existing
internal database, or create a new empty one if missed. Broken data source
means inability to connect to PostgreSQL database, missed datafile, etc.

The data from a PostgreSQL database are periodically fetched into the internal
database, while the data file is read once at startup.

Here are the main concepts:

- A `group` is identified by a name. Every group has
  - members (identified by email address, through `group_member`) and
  - associated privileges (through `group_privilege`).
- A `privilege` is identified by a name _and_ a domain. It has associated rules
  (through `privilege_rule`) that define what the privilege gives access to.
- A `rule` is a combination of sql patterns for a `domain`, a `path` and an
  HTTP `method`. A rule matches an HTTP request, if all of these components
  match the respective attributes of the request. However of all the matching
  rules only the rule with the longest `path` pattern will be used to determine
  whether a user is allowed to perform a request. This is often a bit
  surprising, please see the following example:


Privileges example
------------------

Consider this `group_privilege` and `privilege_rule` relations:

group            | privilege | domain
---------------- | --------- | -----------------
`readers`        | `basic`   | `wiki.example.com`
`readers`        | `read`    | `wiki.example.com`
`editors`        | `basic`   | `wiki.example.com`
`editors`        | `read`    | `wiki.example.com`
`editors`        | `edit`    | `wiki.example.com`
`administrators` | `basic`   | `wiki.example.com`
`administrators` | `read`    | `wiki.example.com`
`administrators` | `edit`    | `wiki.example.com`
`administrators` | `admin`   | `wiki.example.com`

privilege   | domain             | path           | method
----------- | ------------------ | -------------- | ------
`basic`     | `wiki.example.com` | `/%`           | `GET`
`read`      | `wiki.example.com` | `/wiki/%`      | `GET`
`edit`      | `wiki.example.com` | `/wiki/edit/%` | `GET`
`edit`      | `wiki.example.com` | `/wiki/edit/%` | `POST`
`admin`     | `wiki.example.com` | `/admin/%`     | `GET`
`admin`     | `wiki.example.com` | `/admin/%`     | `POST`
`admin`     | `wiki.example.com` | `/admin/%`     | `DELETE`

With this setup, everybody (that is `readers`, `editors` and `administrators`s)
will have access to e.g. `/imgs/logo.png` and `/favicon.ico`, but only
administrators will have access to `/admin/index.php`, because the longest
matching path pattern is `/admin/%` and only `administrator`s have the `admin`
privilege.

Likewise `readers` have no access to e.g. `/wiki/edit/delete_everything.php`.


Keep in mind that:

- Domains are converted into lower case (coming from a data source or HTTP requests).
- Emails are converted into lower case (coming from a data source or OAuth2 providers).
- Groups are case-sensitive and treated as is.
- HTTP methods are *case-sensitive*.
- HTTP query parameters are ignored when matching a request against the rules.
- Privileges are case-sensitive and treated as is.
- SQL wildcards (`_` and `%`) are supported for emails, paths (this _will_ change in future versions).


Checking access in a bunch
--------------------------

There is an API end-point for checking access rights in a single POST query:
`/.sproxy/access`.  Users should be authenticated to use this end-point,
otherwise the respond will be HTTP 511.

The request body shall be a JSON object like this:

```json
{
  "tag1": {"path": "/foo", "method": "GET"},
  "tag2": {"path": "/bar", "method": "GET"}
}
```

And the respond will contain a JSON array with tag matching path and method
pairs allowed to the user.  For example:

```sh
$ curl -d '{"foo": {"path":"/get", "method":"GET"}, "bar": {"path":"/post", "method":"POST"}}' -XPOST -k 'https://example.ru:8443/.sproxy/access' ...
["foo","bar"]

$ curl -d '{"foo": {"path":"/get", "method":"POST"}, "bar": {"path":"/post", "method":"POST"}}' -XPOST -k 'https://example.ru:8443/.sproxy/access' ...
["bar"]

$ curl -d '{"foo": {"path":"/", "method":"POST"}, "bar": {"path":"/post", "method":"GET"}}' -XPOST -k 'https://example.ru:8443/.sproxy/access' ...
[]

```


Logout
------

Hitting the endpoint `/.sproxy/logout` will invalidate the session cookie.
The user will be redirected to `/` after logout.


Robots
------

Since all sproxied resources are private, it doesn't make sense for web
crawlers to try to index them. In fact, crawlers will index only the login
page. To prevent this, sproxy returns the following for `/robots.txt`:

```
User-agent: *
Disallow: /
```


HTTP headers passed to the back-end server
------------------------------------------

All Sproxy headers are UTF8-encoded.


header               | value
-------------------- | -----
`From:`              | visitor's email address, lower case
`X-Groups:`          | all groups that granted access to this resource, separated by commas (see the note below)
`X-Given-Name:`      | the visitor's given (first) name
`X-Family-Name:`     | the visitor's family (last) name
`X-Forwarded-Proto:` | the visitor's protocol of an HTTP request, always `https`
`X-Forwarded-For`    | the visitor's IP address (added to the end of the list if header is already present in client request)


`X-Groups` denotes an intersection of the groups the visitor belongs to and the groups that granted access:

Visitor's groups | Granted groups | `X-Groups`
---------------- | -------------- | ---------
all              | all, devops    | all
all, devops      | all            | all
all, devops      | all, devops    | all,devops
all, devops      | devops         | devops
devops           | all, devops    | devops
devops           | all            | Access denied


Requirements
============
Sproxy2 is written in Haskell with [GHC](http://www.haskell.org/ghc/).
All required Haskell libraries are listed in [sproxy2.cabal](sproxy2.cabal).
Use [cabal-install](http://www.haskell.org/haskellwiki/Cabal-Install)
to fetch and build all pre-requisites automatically.


Configuration
=============

By default `sproxy2` will read its configuration from `sproxy.yml`.  There is
example file with documentation [sproxy.example.yml](sproxy.example.yml). You
can specify a custom path with:

```
sproxy2 --config /path/to/sproxy.yml
```